Source link : https://tech365.info/past-arc-agi-gaia-and-the-seek-for-an-actual-intelligence-benchmark/
Intelligence is pervasive, but its measurement appears subjective. At greatest, we approximate its measure via checks and benchmarks. Consider school entrance exams: Yearly, numerous college students join, memorize test-prep tips and typically stroll away with excellent scores. Does a single quantity, say a 100%, imply those that received it share the identical intelligence — or that they’ve one way or the other maxed out their intelligence? In fact not. Benchmarks are approximations, not actual measurements of somebody’s — or one thing’s — true capabilities.
The generative AI group has lengthy relied on benchmarks like MMLU (Large Multitask Language Understanding) to judge mannequin capabilities via multiple-choice questions throughout educational disciplines. This format allows simple comparisons, however fails to actually seize clever capabilities.
Each Claude 3.5 Sonnet and GPT-4.5, as an example, obtain comparable scores on this benchmark. On paper, this means equal capabilities. But individuals who work with these fashions know that there are substantial variations of their real-world efficiency.
What does it imply to measure ‘intelligence’ in AI?
On the heels of the brand new ARC-AGI benchmark launch — a check designed to push fashions towards basic reasoning and inventive problem-solving — there’s renewed debate round what it means to measure “intelligence” in AI. Whereas not everybody has examined the ARC-AGI benchmark but, the trade…
—-
Author : tech365
Publish date : 2025-04-14 01:10:00
Copyright for syndicated content belongs to the linked Source.
—-
1 – 2 – 3 – 4 – 5 – 6 – 7 – 8