• About
  • Privacy Policy
  • Terms and Conditions
  • Contact
Wednesday, July 9, 2025
  • Login
Best Technologies
  • Home
  • News
  • Tech
  • Spotlight

    Beyond Short-Term Fixes: How Themis Ecosystem Brings Long-Term Green Solutions

    A look inside both the Legion Go and Steam Deck OLED

    Construction robot builds massive stone walls on its own

    Receive an alert when one of your contacts is about to have a special day

    Here are the best iPad deals right now

    Here are the best smart locks you can buy right now

    Biomass Ultima Micro: A Smart Innovation That Solves a Big Problem

    What is an ‘AI prompt engineer’ and does every company need one?

    Recycled coffee grounds can be used to make stronger concrete

  • Business
  • Space
  • Videos
  • More
    • Mobile
    • Windows
    • Energy
    • Security
    • Health
    • Entertainment
No Result
View All Result
  • Home
  • News
  • Tech
  • Spotlight

    Beyond Short-Term Fixes: How Themis Ecosystem Brings Long-Term Green Solutions

    A look inside both the Legion Go and Steam Deck OLED

    Construction robot builds massive stone walls on its own

    Receive an alert when one of your contacts is about to have a special day

    Here are the best iPad deals right now

    Here are the best smart locks you can buy right now

    Biomass Ultima Micro: A Smart Innovation That Solves a Big Problem

    What is an ‘AI prompt engineer’ and does every company need one?

    Recycled coffee grounds can be used to make stronger concrete

  • Business
  • Space
  • Videos
  • More
    • Mobile
    • Windows
    • Energy
    • Security
    • Health
    • Entertainment
No Result
View All Result
Best Technologies
No Result
View All Result
Home News

Leading AI models fail new test of artificial general intelligence

by News Room
July 3, 2025
in News
Share on FacebookShare on Twitter

The ARC-AGI-2 benchmark is designed to be a difficult test for AI models

Just_Super/Getty Images

The most sophisticated AI models in existence today have scored poorly on a new benchmark designed to measure their progress towards artificial general intelligence (AGI) – and brute-force computing power won’t be enough to improve, as evaluators are now taking into account the cost of running the model.

There are many competing definitions of AGI, but it is generally taken to refer to an AI that can perform any cognitive task that humans can do. To measure this, the ARC Prize Foundation previously launched a test of reasoning abilities called ARC-AGI-1. Last December, OpenAI announced that its o3 model had scored highly on the test, leading some to ask if the company was close to achieving AGI.

But now a new test, ARC-AGI-2, has raised the bar. It is difficult enough that no current AI system on the market can achieve more than a single-digit score out of 100 on the test, while every question has been solved by at least two humans in fewer than two attempts.

In a blog post announcing ARC-AGI-2, ARC president Greg Kamradt said the new benchmark was required to test different skills from the previous iteration. “To beat it, you must demonstrate both a high level of adaptability and high efficiency,” he wrote.

The ARC-AGI-2 benchmark differs from other AI benchmark tests in that it focuses on AI models’ abilities to complete simplistic tasks – such as replicating changes in a new image based on past examples of symbolic interpretation – rather than their ability to match world-leading PhD performances. Current models are good at “deep learning”, which ARC-AGI-1 measured, but are not as good at the seemingly simpler tasks, which require more challenging thinking and interaction, in ARC-AGI-2. OpenAI’s o3-low model, for instance, scores 75.7 per cent on ARC-AGI-1, but just 4 per cent on ARC-AGI-2.

The benchmark also adds a new dimension to measuring an AI’s capabilities, by looking at its efficiency in problem-solving, as measured by the cost required to complete a task. For example, while ARC paid its human testers $17 per task, it estimates that o3-low costs OpenAI $200 in fees for the same work.

“I think the new iteration of ARC-AGI now focusing on balancing performance with efficiency is a big step towards a more realistic evaluation of AI models,” says Joseph Imperial at the University of Bath, UK. “This is a sign that we’re moving from one-dimensional evaluation tests solely focusing on performance but also considering less compute power.”

Any model that is able to pass ARC-AGI-2 would need to not just be highly competent, but also smaller and lightweight, says Imperial – with the efficiency of the model being a key component of the new benchmark. This could help address concerns that AI models are becoming more energy-intensive – sometimes to the point of wastefulness – to achieve ever-greater results.

However, not everyone is convinced that the new measure is beneficial. “The whole framing of this as it testing intelligence is not the right framing,” says Catherine Flick at the University of Staffordshire, UK. Instead, she says these benchmarks merely assess an AI’s ability to complete a single task or set of tasks well, which is then extrapolated to mean general capabilities across a series of tasks.

Performing well on these benchmarks should not be seen as a major moment towards AGI, says Flick: “You see the media pick up that these models are passing these human-level intelligence tests, where actually they’re not; what they are doing is really just responding to a particular prompt accurately.”

And exactly what happens if or when ARC-AGI-2 is passed is another question – will we need yet another benchmark? “If they were to develop ARC-AGI-3, I’m guessing they would add another axis in the graph denoting [the] minimum number of humans – whether expert or not – it would take to solve the tasks, in addition to performance and efficiency,” says Imperial. In other words, the debate over AGI is unlikely to be settled soon.

Topics:

Source: New Scientist

Tags: ChatGPT

Related Posts

News

Military tech giant Anduril lands in Bellevue, doubling footprint in Seattle region

July 9, 2025
News

Will we ever feel comfortable with AIs taking on important tasks?

July 9, 2025
News

As AI reshapes its workforce, Microsoft commits $4 billion to help others adapt

July 9, 2025
News

Industrial wastewater treatment startup Membrion raises more cash

July 9, 2025
News

How Smartsheet is ‘AI-ifying’ its product and company following $8.4B private equity takeover

July 9, 2025
News

Why falling in love with an AI isn’t laughable, it’s inevitable

July 9, 2025

Trending Now

Plugin Install : Popular Post Widget need JNews - View Counter to be installed

Latest News

Security

Get Your Hot Unrivaled Streaming Workouts Right Here, Because Peloton Is On Sale

July 9, 2025
Mobile

Was 2025 the best year to release a Galaxy Flip FE?

July 9, 2025
Tech

The Columbia hack is a much bigger deal than Mamdani’s college application

July 9, 2025
Business

McDonald’s AI Hiring Bot Exposed Millions of Applicants’ Data to Hackers Who Tried the Password ‘123456’

July 9, 2025
Security

This Prime Day TV Deal Brings Your Eyes Into 2025

July 9, 2025
News

Military tech giant Anduril lands in Bellevue, doubling footprint in Seattle region

July 9, 2025
Best Technologies

Best Technologies™ is an online tech news portal. It started as an honest effort to provide unbiased and well-suited information on the latest and trending tech news.

Sections

  • Business
  • Energy
  • Entertainment
  • Health
  • Mobile
  • News
  • Security
  • Space
  • Spotlight
  • Tech
  • Windows

Browse by Topic

AI amazon amazon prime day android Apple apps artificial intelligence buying guides cars deals Donald Trump elon musk energy Entertainment gadgets gaming google health household how to iOS Meta microsoft mobile news Nintendo OpenAI phones policy politics Prime Day privacy review reviews Roundup science security shopping smart home social media space streaming Tech Wearable Xbox

Recent Posts

  • Get Your Hot Unrivaled Streaming Workouts Right Here, Because Peloton Is On Sale
  • Was 2025 the best year to release a Galaxy Flip FE?
  • The Columbia hack is a much bigger deal than Mamdani’s college application
  • About
  • Privacy Policy
  • Terms and Conditions
  • Contact

© 2022 All Right Reserved - Blue Planet Global Media Network

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • News
  • Tech
  • Spotlight
  • Business
  • Space
  • Videos
  • More
    • Mobile
    • Windows
    • Energy
    • Security
    • Health
    • Entertainment

© 2022 All Right Reserved - Blue Planet Global Media Network

This website uses cookies. By continuing to use this website, you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.