Oxford study finds AI performance tests lack scientific rigor
Widely used tests to measure artificial intelligence capabilities may be fundamentally flawed and oversell AI performance, according to a new study from the Oxford Internet Institute. Researchers examined 445 benchmarks that AI developers use to evaluate their models and found significant methodological problems. Jared Perlo reports for NBC News that roughly half of the benchmarks …