Arithmetic Reasoning Tests

19h

Why China’s AI Models Are Secretly Struggling With Complex Reasoning

New tests show China’s AI models trail Western systems on ARC AGI 2, scoring roughly like leading U.S. models from eight ...

I tested Google’s upcoming Gemini Nano 4 — its faster, smarter AI isn’t what I expected

Google's upcoming Gemini Nano 4 promises faster, smarter on-device AI for smartphones, so I tested it ahead of release to see ...

TMCnet

Stanford-Recognized K-12 Online Tutoring Program Now Enrolling for Summer Tutoring, Summer Test Prep, and Year-Round Virtual 1-on-1 Sessions

Don't let summer pass without a plan! The families who act now are the ones whose kids start September ahead!

The Business Standard

Meta releases first new AI model since shaking up team

The new AI model was described as small and fast by design, capable of reasoning through complex questions in science, math and health ...

Alexandria Times

Douglas MacArthur kids do math for fun

By Sydney Kodama | skodama@alextimes.com Alexandria City Public Schools parent Jeremy Miller said he’s seen kids practice math on the weekend with his own eyes. “Part of that [attendance] is having ...

India Today

AI just solved a 20-year math problem. Are humans still needed?

AI stuns researchers by solving a 20-year-old mathematical challenge with near-human reasoning, marking a breakthrough in artificial intelligence and raising new questions about the future of human ...

AOL

This Ability Test Is Designed For Kids, But Even Adults Struggle To Finish It: Test Yourself

The Naglieri Nonverbal Ability Test (NNAT) is a nonverbal assessment designed to measure general reasoning ability in K-12 students, helping schools identify students with strong problem-solving ...

IEEE

Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Abstract: Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math ...

GitHub

MathVista: Evaluating Math Reasoning in Visual Contexts

Code for the Paper "MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts". For more details, please refer to the project page with ...

syracuse.com

2025 NY school test scores: Search new English, math results for every district

Scores on New York’s statewide assessment tests improved in both math and English language arts during the 2024-2025 school year. Statewide, 57% of students tested proficient in math last year, up 3 ...

Beebom

ChatGPT 4o vs ChatGPT 4: Premium Features for Free?

In our first test, ChatGPT 4o and ChatGPT 4 performed along the same lines. Despite having access to Code Interpreter, none of the models used it for mathematical calculation and straight up answered ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results