New Apple study challenges whether AI models truly think through problems appeared first on MacDailyNews. New Apple study challenges whether AI models truly think through problems appeared first on MacDailyNews. New Apple study challenges whether AI models truly think through problems appeared first on MacDailyNews. New Apple study challenges whether AI models truly think through problems appeared first on MacDailyNews.
Earlier this month, Apple researchers published a study indicating that simulated reasoning (SR) models, including OpenAI’s o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, generate responses that align with

neurons

Earlier this month, Apple researchers published a study indicating that simulated reasoning (SR) models, including OpenAI’s o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, generate responses that align with pattern-matching from their training data when tackling new problems that demand systematic reasoning.

Benj Edwards for Ars Technica:

The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these same models achieved low scores on novel mathematical proofs.

The new study, titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” comes from a team at Apple…

The researchers examined what they call “large reasoning models” (LRMs), which attempt to simulate a logical reasoning process by producing a deliberative text output sometimes called “chain-of-thought reasoning” that ostensibly assists with solving problems in a step-by-step fashion.

To do that, they pitted the AI models against four classic puzzles — Tower of Hanoi (moving disks between pegs), checkers jumping (eliminating pieces), river crossing (transporting items with constraints), and blocks world (stacking blocks) — scaling them from trivially easy (like one-disk Hanoi) to extremely complex (20-disk Hanoi requiring over a million moves).

“Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy,” the researchers write. In other words, today’s tests only care if the model gets the right answer to math or coding problems that may already be in its training data—they don’t examine whether the model actually reasoned its way to that answer or simply pattern-matched from examples it had seen before.


MacDailyNews Take: Back in the day, our math teachers always required that we show our work, not just provide the correct answer.



Please help support MacDailyNews — and enjoy subscriber-only articles, comments, chat, and more — by subscribing to our Substack: macdailynews.substack.com. Thank you!

Support MacDailyNews at no extra cost to you by using this link to shop at Amazon.

The post New Apple study challenges whether AI models truly think through problems appeared first on MacDailyNews.

original link


You may also be interested in this

9to5Mac Daily: August 7, …

Listen to a recap of the top stories of the day from 9to5Mac. 9to5Mac Daily is available on iTunes and Apple’s Podcasts app, Stitcher, TuneIn, Google Play, or through our dedicated RSS feed for Overcast and other

Control Ultimate Edition …

A Mac version of the popular console game Control Ultimate Edition will launch on February 12 next year, says developer Remedy. We don’t yet know the specs required to run

Apple Music Classical lau…

With Apple Music Classical on the web as of Thursday, music lovers have a convenient new way to access the world's largest catalog. (via Cult of Mac - Apple news,

Apple TV+ announces secon…

A second season of beloved animated kids and family series “Eva the Owlet” is coming to Apple TV+ January 24, 2025. Apple TV+ on Wednesday announced a second season for

2025 Mac Studio review ro…

Early reviews of the 2025 Mac Studio tout the speed boost that comes with the Apple M3 Ultra or M4 Max processor. (via Cult of Mac - Apple news, rumors,

What to expect in iOS 19:…

What to expect in iOS 19? Look for visionOS-like design changes, revamped Messages, a redesigned keyboard, floating navigation bars and more. (via Cult of Mac - Apple news, rumors, reviews

Don’t install iPadOS 18 o…

A bug apparently bricked some M4 iPad Pro units that installed the new iPadOS 18, causing Apple to remove the upgrade option for this model. (via Cult of Mac -

Intel’s CEO shakeup is an…

It hasn’t been a great season for Intel, and now its CEO Pat Gelsinger is out. And in the midst of the chipmaker’s transitionary period, Apple silicon is looking like
X

A whimsical homage to the days in black and white, celebrating the magic of Mac OS. Dress up your blog with retro, chunky-grade pixellated graphics to evoke some serious computer nostalgia. Supports a custom menu, custom header image, custom background, two footer widget areas, and a full-width page template. I updated Stuart Brown's 2011 masterpiece to meet the needs of the times, made it responsive , got dark mode, custom search widget and more.You can download it from tigaman.com, where you can also find more useful code snippets and plugins to get even more out of wordpress.