New Apple study challenges whether AI mo…

06.12.2025

Earlier this month, Apple researchers published a study indicating that simulated reasoning (SR) models, including OpenAI’s o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, generate responses that align with pattern-matching from their training data when tackling new problems that demand systematic reasoning.

Benj Edwards for Ars Technica:
‎

The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these same models achieved low scores on novel mathematical proofs.

The new study, titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” comes from a team at Apple…

The researchers examined what they call “large reasoning models” (LRMs), which attempt to simulate a logical reasoning process by producing a deliberative text output sometimes called “chain-of-thought reasoning” that ostensibly assists with solving problems in a step-by-step fashion.

To do that, they pitted the AI models against four classic puzzles — Tower of Hanoi (moving disks between pegs), checkers jumping (eliminating pieces), river crossing (transporting items with constraints), and blocks world (stacking blocks) — scaling them from trivially easy (like one-disk Hanoi) to extremely complex (20-disk Hanoi requiring over a million moves).

“Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy,” the researchers write. In other words, today’s tests only care if the model gets the right answer to math or coding problems that may already be in its training data—they don’t examine whether the model actually reasoned its way to that answer or simply pattern-matched from examples it had seen before.

‎
MacDailyNews Take: Back in the day, our math teachers always required that we show our work, not just provide the correct answer.

‎
Please help support MacDailyNews — and enjoy subscriber-only articles, comments, chat, and more — by subscribing to our Substack: macdailynews.substack.com. Thank you!

Support MacDailyNews at no extra cost to you by using this link to shop at Amazon.

The post New Apple study challenges whether AI models truly think through problems appeared first on MacDailyNews.

original link

New Magic Mouse, Trackpad…

10.21.2024

We’ve been hearing rumors about Apple gearing up to announce new M4 Macs as soon as next week, but it looks like the company also has some updates for its

macOS 15.4 beta 3 now ava…

03.10.2025

Apple has released macOS Sequoia 15.4 beta 3 to developers, bringing the latest version of the forthcoming Mac software update ahead of its public launch. Here’s what to expect. more…

Apple Original Films cele…

10.11.2024

Steve McQueen, Elliott Heffernan and Saoirse Ronan attend the 2024 BFI London Film Festival Opening Night World Premiere of Apple Original Films’ “Blitz” at the Royal Festival Hall. “Blitz” will

New Mexico driver’s licen…

12.05.2024

New Mexico joined the list of states whose residents can store a digital version of their driver’s license and state ID in Apple Wallet. (via Cult of Mac - Apple

Upgrade your M4 iPad Pro …

06.21.2024

As someone who uses their iPad Pro as their main computer, I tend to bring it with me wherever I go. One of my favorite aspects about the iPad is

All the sweet new watchOS…

06.04.2025

What features will the next version of watchOS add to Apple Watch? A look at what's coming soon to your wrist at WWDC25. (via Cult of Mac - A 24/7

Apple reveals iOS 18 usag…

01.24.2025

Apple has shared the first details on user adoption of iOS 18 so far. The company says that iOS 18 is currently installed on 68% of all iPhones and on

Peanuts ‘Charlie Brown’ h…

09.25.2024

Apple TV+ is the streaming home for the classic Peanuts holiday specials. You can watch the ‘Charlie Brown’ Halloween, Thanksgiving, and Christmas specials right now in Apple’s TV app. And

Click it!

Search

ipinfo.io

user online393

your visit count

your ID

cityipinfo.io error

postalipinfo.io error

regionipinfo.io error

countryipinfo.io error

timezoneipinfo.io error

locipinfo.io error, ipinfo.io error

orgipinfo.io error

hostipinfo.io error

OSUnknown Operating System

IP216.73.216.170

languageipinfo.io error

browser ↓

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

support me with your mouse

findip.net

user online393

your visit count

your ID

cityColumbus

continentNA

countryUnited States

system16509

providerAmazon.com

Time ZoneAmerica/New_York

Weather CodeUSOH0212

Subdivision NameFranklin

loc39.9612,-82.9988

orgAmazon.com, Inc.

connectionCorporate