Skip to content

MSN Technology

Tech Solutions for a Smarter World

Menu
  • About MSN Technology
  • Contact Us
  • Write for Us
Menu
GettyImages 1708266672

A new, challenging AGI test stumps most AI models

Posted on March 25, 2025

The ARC Prize Foundation, a non -profit, on a jointly based on AI’s prominent researcher Francis Cholat, announced that in a Blog Post On Monday, it has created a new, challenging test to measure the general intelligence of the leading models of AI.

So far, the new test, called the Arc Agi-2, has stumped most models.

“Argument” AI model such as Openi’s O1-PRO and Depsteek R1 score between 1 % and 1.3 % on Arc-AGI-2 Arc Prize Leader Board. Powerful non -reasoning models, including GPT -4.5, Cloud 3.7 Sant, and Gemini 2.0 Flash Scores, about 1 %.

The arc-eg test contains the problems like the puzzle where AI has to identify visual patterns from a storage of different colors, and prepare the correct “answer” grid. Problems were designed to force an AI to adapt to new problems that had not been seen before.

More than 400 people at the Arc Prize Foundation were taking Take for the establishment of a human base line on the Arc -EG -2. On average, 60 % of the tests of those “panels” are fine – which are far better than any of the model scores.

Screenshot 2025 03 24 at 3.16.48PM
A sample question of Arc-Agi-2 (Credit: Arc Prize).

A post on xCh himt claimed that the first repetition of the Arc -E -2 test, an AI model is a better step in real intelligence than the Arc -EG -1. The ARC Prize Foundation’s tests aims to assess whether the AI ​​system can effectively achieve new skills outside the data on which it was trained.

Unlike the Arc-E-1, the new test prevents AI models from relying on the “brot force”-extensive computing power-to find a solution, Ch himt said. Cholt had previously acknowledged It was a major flaw of the Arc -E -1.

To remove the first test flaws, Arc -EG -2 introduces performance: Performance. It also requires models to translate samples on bees instead of relying on memorization.

The Arc Prize Foundation co -founder Greg Comradit wrote in one, “The intelligence of the intelligence has not been described as the ability to solve the problems or get a higher score.” Blog Post. “The performance with which these capabilities have been achieved and deployed is an important, fixed component. The basic question being asked is not just, ‘can AI get [the] Skills to solve a job? ‘But also,’ at what performance or price? ‘

Arc -AG -1 December 2024 remained unbeaten for nearly five years when Openai released it Advanced reasoning model, O3Which improved all other AI models and fought human performance on diagnosis. However, as we had noted at the time, O3 Performance on Arc -AG -1 came with heavy price tags.

Openi’s O3 Model-O3 (Low) version-which was about to reach new heights on the Arc-Agi-1, which scored 75.7 % in the test, 4 % measured at ARC-AG-2 using $ 200 worth of computing power on Arc-AGI-2.

Screenshot 2025 03 24 at 3.18.29PM
Compare Frontier AI model performance on Arc -AGI -1 and Arc -EG -2 (Credit: Arc Prize).

The arrival of the Arc -EG -2 has come to light as many people in the tech industry are demanding new, dissatisfied standards to measure AI’s progress. Thomas Wolf, a co -founder of the begging face, recently told Tech Crunch The AI ​​industry lacks enough tests to measure the key traits of the so -called artificial general intelligenceIncluding creativity.

Along with the new benchmark, the Arc Prize Foundation announced A new arc prize 2025 competitionTo challenge developers, to reach 85 % accuracy on the Arc -AG -2 test, while only $ 0.42 per task is to be spent.

Source link

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Apple: “Hundreds of millions to billions” lost without App Store commissions
  • Google hits back after Apple exec says AI is hurting search
  • Microsoft effectively raises high-end Surface prices by discontinuing base models
  • Senate passes “cruel” Republican plan to block Wi-Fi hotspots for schoolkids
  • DOGE software engineer’s computer infected by info-stealing malware

Recent Comments

  1. How to Make a Smart Kitchen: The Ultimate Guide - INSCMagazine on Top Smart Cooking Appliances in 2025: Revolutionizing Your Kitchen
  2. Top Smart Cooking Appliances in 2025: Revolutionizing Your Kitchen – MSN Technology on Can I Control Smart Cooking Appliances with My Smartphone?
  3. Venn Alternatives for Remote Work: Enhancing Productivity and Collaboration – MSN Technology on Top 9 AI Tools for Data Analytics in 2025
  4. 10 Small Business Trends for 2025 – MSN Technology on How To Extending Your Business Trip for Personal Enjoyment: A Guide

Archives

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024

Categories

  • Business
  • Education
  • Fashion
  • Home Improvements
  • Sports
  • Technology
  • Travel
  • Uncategorized
©2025 MSN Technology | Design: Newspaperly WordPress Theme
Go to mobile version