Apple Drops AI Bomb Claiming There's ZERO Evidence LLM's Can Reason (+Tesla Optimus Robot Scandal)
Apple's new research paper claims LLMs can't actually reason, Tesla's We Robot Keynote, Meta's new MovieGen model's impact on film, and a speed round.
Watch video debrief here:
Advancement #1: New Apple Research Says 0 Evidence That LLM’s Can Actually Reason
Apple just dropped an AI bomb in its latest research paper, claiming that large language models (LLMs), aren't the reasoning masters that we think they are... and its causing an uproar in the AI community.
Apple conducted all of these experiments and concluded that there was 0 evidence that these models showed any signs of reasoning. ZERO. That is a big statement. Rather, they believe that instead of using true logic, LLMs seem to severely rely on pattern recognition.
Let’s dive into the techie details of this research paper a bit.
When evaluating a model’s performance, there are all of these different kinds of tests, which are also called benchmarks, aand the benchmarks differ depending on what we're trying to test in the LLM.
So in the case of this research paper, Apple constantly referenced the mathematical benchmark used in AI evaluation called GSM8K. Ultimately Apple began to suspect that these benchmarks were giving us a false sense of security.
So they introduced a new test, which they called the GSM Symbolic benchmark.
(Side bar: there has already been a significant amount of conversation in this area that the benchmarks currently evaluating models are based on research use, not for real world application, that’s a different conversation for another time that you can see in one of my other videos).
Now, here's what they did.
Using this new benchmark, they cloned the exact same math questions from the old benchmark, and made crazy simple changes, like changing the names, objects, or numbers. For example, if the original math question was “Ronald has 8 oranges", they would only change the names and numbers, so Ronald would become Jennifer and orange to bananas, and from 8 oranges, it would become 4.
The math and the logic of the problem remained the same, and if the model could truly reason, it could handle those tiny changes that an 8 year old could handle… but it couldn’t. In fact, performance dropped from 10%+ when only names and numbers were changed, or irrelevant information was thrown into the statement.
To put that into perspective, take this question as an example. “Ronald is outside and has 7 bananas. If he gives two away and wears 2 socks, how many bananas does he have left?"
Based off of Apple’s research, talking about the socks here… which WE know is irrelevant to solving this problem, causes a huge drop in performance... something that a elementary school kid would probably realize. Hopefully, anyway.
Apple researchers gave some recommendations which you can find in the paper, but ultimately, the consensus was the next big challenge for the AI industry to achieve AGI is to build models that move beyond pattern recognition and demonstrates true logical reasoning.
My Initial Thoughts:
Well, this is definitely a bit mind-blowing, because it challenges a lot of assumptions we had about how much reasoning LLMs can actually do. I’ve seen people make comments about this research like, “I guess LLMs are just statistical models”…
which is true—but they’ve always been that.
This research rather pushes us to reconsider how much trust we can place in them, especially in critical areas like healthcare or government or education, where the precision and accuracy could mean life or death.
If LLMs can fail by simply having irrelevant information added or changing names, it’s a huge deal, and could totally reshape how and where we use AI.
On top of that, this should officially push the AI community to get rid of those outdated benchmarks once and for all, because we're too far into this AI journey to be having this conversation.
Advancement #2: Tesla’s We Robot Keynote That Left More Questions Than Answers
Tesla's recent "We, Robot" keynote made waves with some bold reveals, including a Robotaxi and Robovan (which we speculated in a video a few months ago because of some government filings… you're welcome, for being ahead of the research hehe), as well as updates about their Optimus robots.
If you were on social media last week and saw these white humanoids walking amongst humans, yeah, those are the robots we’re talking about.
Tesla claimed this robot represents the future of AI and robotics, with Musk stating Optimus would become “the biggest product ever of any kind”, and "The Optimus robots will walk among you. Please be nice to the robots. You’ll be able to walk right up to them and they'll serve drinks at the bar."
Despite their impressive show, it became clear that the robots weren’t actually autonomous or AI and instead had a human controller in a full suite behind the scenes, leading many to feel that the display was a deceptive performance rather than a true AI breakthrough.
The two other big reveals which probably actually has the tech to back it up were the Robotaxi and Robovan, both public transportation vehicles designed to be fully autonomous without steering wheels or pedals. Musk explained that these vehicles could make transportation a lot cheaper and safer, costing just 20 cents per mile.
The Robovan was my favorite, it’s like a train with no visible wheels and makes me feel like i'm in the movies I dream up, and its built to carry up to 20 people or transport goods.
A little easter egg for you all is some think because of the specs of the robovan that it was designed for the underground tunnels being built by The Boring Company, which is another company Musk owns that is hoping to create a network of tunnels under cities to improve traffic.
My Initial Thoughts:
The Robovan was definitely the coolest thing ever… I loved it, but while this keynote while was packed with exciting promises for the future of autonomous tech and robots, it raised doubts about how far along they actually are, and if they can live up to the hype.
Optimus was purely smoke and mirrors rather than real progress, stocks fell. Many experts are skeptical that Tesla will actually make the launch date for the Robotaxi next year, especially given their history of missed deadlines and the ongoing scrutiny over safety concerns with their current driver-assist systems, which is also slowing down regulatory approval.
Shocker.
Advancement #3: Meta Launches New Multi-Modal Model MovieGen
Meta just launched MovieGen, an AI model that creates short video and audio clips based on user prompts. Competing with tools like OpenAI’s Sora or ElevenLabs, MovieGen allows users to generate up to 16-second videos and 45-second audio clips.
While not publicly available to us commoners—and with no sign of that changing anytime soon, exactly like Sora—the company plans to work with content creators and the entertainment industry to integrate the model into its products next year.
My Initial Thoughts:
Let’s talk about the hidden conversation- the launch of MovieGen continues to ignite ongoing debates within the film industry about the impact of generative AI, because Hollywood has been grappling with how to leverage this tech while balancing creativity and ethical concerns.
OpenAI for example, sparked both interest and anxiety in Hollywood, when Scarlett Johansson accused the company of imitating her voice without consent, and studios like Lionsgate have already partnered with AI startups like Runway to train AI models on their film libraries.
Which, as an actor, a producer, a screen writer… is that extortion? Did I agree to that in my contract I signed 15 years ago? Or is it legal to have my likeness or work used in that way?
Should it be allowed?
These are questions we don’t have the answers to, while the tech continues to move forward with these legal gray areas. From the copyright point of view, my gut feels this will be outlawed, but the damage will already be done.
Aside from the film industry, lawmakers have raised concerns about AI-generated deepfakes, particularly in elections, so we have to address these concerns really soon to properly embrace this tech. I don’t know how they’re going to do it, but it has to be done asap.
Advancement #4: Nvidia, Google and China Speed Round
1. Nvidia
Nvidia’s CEO Jensen Huang revealed in an interview that he envisions Nvidia as one day having 100 million AI employees, across every employee group, who even participate in slack channels. If unemployment wasn't already bad, it's going to get worse.
2. Google Funds Nuclear Power for AI Energy
Google has announced a "first-of-its-kind" agreement with startup Kairos Power to fund the construction of seven small nuclear power plants in order to support the energy needs of its AI operations.
3. China’s Unmanned Road Paving Project
China has completed the world's first fully unmanned road paving project, covering a 157 km stretch of the Beijing-Hong Kong Expressway using a fleet of autonomous machines, marking an incredible leap in road paving technology.
And that is it for this week, please remember to like comment subscribe, and see you all next time!!
For actual logical reasoning you need to deal with truth values, and there's a big difference between pattern matching and dealing with truth values.
https://davidhsing.substack.com/p/why-neural-networks-is-a-bad-technology