The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers -

Each time a brand new AI mannequin is launched, it’s usually touted as acing its efficiency towards a collection of benchmarks. OpenAI’s GPT-4o, for instance, was launched in Could with a compilation of outcomes that confirmed its efficiency topping each different AI firm’s newest mannequin in a number of checks.

The issue is that these benchmarks are poorly designed, the outcomes exhausting to copy, and the metrics they use are steadily arbitrary, in response to new analysis. That issues as a result of AI fashions’ scores towards these benchmarks decide the extent of scrutiny they obtain.

AI corporations steadily cite benchmarks as testomony to a brand new mannequin’s success, and people benchmarks already type a part of some governments’ plans for regulating AI. However proper now, they won’t be adequate to make use of that method—and researchers have some concepts for the way they need to be improved.

—Scott J Mulligan

We have to begin wrestling with the ethics of AI brokers

Generative AI fashions have grow to be remarkably good at conversing with us, and creating photos, movies, and music for us, however they’re not all that good at doing issues for us.

AI brokers promise to alter that. Final week researchers printed a brand new paper explaining how they skilled simulation brokers to copy 1,000 individuals’s personalities with gorgeous accuracy.

AI fashions that mimic you could possibly exit and act in your behalf within the close to future. If such instruments grow to be low-cost and straightforward to construct, it’ll elevate a number of new moral issues, however two particularly stand out. Learn the complete story.

—James O’Donnell

The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

What GPT-4.5 Reveals In regards to the Future

Llama 4 Fashions: Meta AI is Open Sourcing the Greatest!

What’s New in Devin 2.0? Full Breakdown Inside

The Rise of Small Reasoning Fashions: Can Compact AI Match GPT-Degree Reasoning?

NotebookLM Evaluation: The Way forward for Analysis Unlocked

What GPT-4.5 Reveals In regards to the Future

Llama 4 Fashions: Meta AI is Open Sourcing the Greatest!

What’s New in Devin 2.0? Full Breakdown Inside

The Rise of Small Reasoning Fashions: Can Compact AI Match GPT-Degree Reasoning?