I'm over two years late to the "LLMs are terrible at reasoning" party. Not two weeks late – two years.
But that's not actually my main topic today
Because I'm not the expert
While I've done neural networks research, it pre-dates
deep learning…
transformers…
and everything after
How can anyone keep up?
There's a secret to learning about wildly different topics: you don't need to be The Expert
On the topic du jour, as everyone & their pet alligator knows by now, Apple researchers recently told us that LRMs and LLMs can't really reason
Cue self-proclaimed experts: "I knew it"
Other self-proclaimed experts: "It's just a blip. We'll soon be back to regular programming" (ba-dum-tish)
Those who used to say that "AGI is almost here"? Hiding
The truth? Most of them don't have a clue
They're not lying. They really think they know. But they don't
*** So how can regular folks know what to think? ***
Why do I, a self-proclaimed non-expert, claim that I knew long ago whether LLMs could reason or "think"?
The answer to both of those questions is: not mass media, not LinkedIn, not Reddit
These are great for many, MANY other topics
But researchers don't post their work on social media. They might not even have an account!
So then?
First ask yourself if you are serious about learning about this stuff
If you aren't, stop reading right now and go back to the memes
You're still here. Ok, then
If your last exposure to algebra was before you were 16,
↪︎ Gary Marcus on Sean Carroll's podcast
↪︎ subscribe to Jack Clark's AI newsletter
↪︎ Josh Wolfe at Lux Capital never disappoints
Deeper?
↪︎ Mike Pound on Computerphile
Still more?
↪︎ François Chollet << he's hardcore
Comfy with matrix algebra?
↪︎ 3B1B's video series on neural networks. Accessible animations and not much scary maths
The "final" level?
↪︎ Figure out who the top researchers are, see where they publish and who they cite. Then go read them
At least follow them on Bluesky!
So what's my view, having learned from the real experts? Can LLMs/LRMs get us to genuine thought? Are we headed to AGI? Will there ever be a ghost in the machine?
Er, no
But a qualified No
Making the models larger or throwing faster chips at them or building bigger data centres or making the algos more efficient – none of this adds new information
While I am as much a fan of emergence as the next guy, this isn't that. This is going from a Model T to a Focus. We're not turning a Model T into a nuclear submarine, even if you manage to get to a Bugatti Veyron
You need to add new information. A different type of architecture
This is possible – I am a materialist – but not with the current approach
Speaking of Veyrons, Cesar Hidalgo's book Why Information Grows is my final tip for you
//end
Comments