The Crazy Week with 21 AI Updates
This Week's AI Advancements: GPT-4o Release, Google I/O, MITRE+Nvidia and the Federal AI Supercomputer, Universal Basic Compute, Mistral AI Series A Funding Round, and Apple+OpenAI.
This week was the most tiring week in the existence of this blog, and you absolutely don’t want to miss this one. I still managed to explain all the important ones in 12 minutes. Watch the video debrief if you don’t want to read... I even take a nap at the end.
Advancement #1: OpenAI Launches New AI Model GPT-4o and iOS Desktop App
OpenAI has announced the release of GPT-4o (omni), a new model that offers GPT-4-level intelligence with enhanced capabilities for language performance, vision, and audio.
Main updates include the following:
Efficiency: Twice as fast and half the cost of GPT-4 Turbo.
Audio Capabilities: now can perceive and convey emotions, handle interruptions, and respond to audio prompts quickly (as fast as 232 milliseconds, which is similar to human speech).
Translation/Interpreting: Improved translation performance and can function as an interpreter in audio mode, enabling real-time bilingual conversations. They also updated the tokenizer, so some languages have 4x more compression, which equals 4x better latency.
Visual Comprehension: Better at understanding and discussing images, such as translating menus, providing cultural insights, and ask questions. For example, you can snap a picture of a croissant and ask the estimated nutritional content.
Improved UX/UI Experience: Prioritized some design changes to make interacting with ChatGPT that much easier.
Desktop App for MacOS: so you can instantly ask questions with the simple keyboard shortcut (Option + Space), start voice interactions directly from the app by tapping the headphone icon, and take/discuss screenshots within the app.
Language Support: Now provides support in 50 different languages with improved speed and quality.
Planned updates also include real-time voice and video interactions, allowing for dynamic conversations and live explanations such as asking ChatGPT to explain the rules of a game on TV.. GPT-4o is already available to ChatGPT Plus and Team customers, with broader access for free users and enterprise clients to follow.
My Initial Thoughts: Everyone thought this was going to be an announcement for GPT-5, which it wasn’t.
If OpenAI had GPT-5, they have would shown it, so they don’t have it after 14 months of trying. The most important thing to determine the weight of this launch is the performance. So yes, GPT-4o seems incredibly fast in terms of efficiency and that’s no small feat that was accomplished here, the lag seems almost non-exist now. However, the performance shows that 4o is not that different than Turbo, which was also not that different from GPT-4.
This launch did show some promising advancements with data analytics, vision comprehension, and language translation. Being in the translation industry, there’s a lot of concern about AI, but this release made me realize that we’re actually moving towards make language more accessibility targeted for the every day user.
But that every day user most likely won’t be significantly impacted by mistakes. Look at DuoLingo, for example. They recently laid off a significant amount of their linguists to opt for GPT-4, and there have been errors, but the users using DuoLingo at worst, won’t know how to order a coffee in Scottish Gaelic or Spanish if the translation is wrong.
What that means is that the underlying mission of the language industry to provide highly precise services in environments that mistakes cannot be made, that won’t go away any time soon. Linguists, definitely incorporate AI into your processes, though.
Advancement #2: Google’s I/O Event Brings 5 Groundbreaking Changes
Google's I/O 2024 event on Tuesday showcasing the tech giant's continued innovation and commitment to push the boundaries of what's possible with AI. This year's event highlighted five major developments that promise to transform how we interact with technology.
1. Google Search Changes:
Google's reengineered, AI-infused search is set to launch in the US this week, providing summaries and synthesizing answers to users' queries. This new feature has caused tremendous concern that it will be “catastrophic” for many companies because even though the responses often contain links, users might not click through to actual websites. A Gartner survey predicts a 25% drop in search engine volume by 2026 due to generative AI tools.
2. Private Space on Android:
Google introduced Private Space, a new feature on Android that lets users create a secure "container" for sensitive information. This functionality is similar to Incognito Mode but applies to the entire mobile operating system, ensuring personal data and designated apps are isolated and protected.
3. Veo - AI Video Generation Tool:
Google unveiled Veo, an advanced AI-powered video-generation tool. Unlike existing tools like Synthesia, Veo promises a superior video creation experience and even collaborated with a Hollywood producer to make a short film entirely using Veo. However, it’s not actually released, yet.
4. Project Astra - Universal Assistant:
Project Astra is Google's new real-time, multimodal AI assistant. It can see its environment, recognize objects, find misplaced items, and assist with various tasks. An un-altered demo showcased Astra identifying speaker parts, locating glasses, and reviewing code, all in real time and conversationally.
5. Gemini Model Updates: Google introduced the Gemini 1.5 Flash model, optimized for faster tasks like summarization and captioning. The context window for Gemini Pro has also doubled to 2 million tokens, enhancing the model's ability to process information and follow instructions more effectively.
My Initial Thoughts: If you have been there with me from the beginning, or read my Multilingual article from December, I have been saying for awhile that people will probably forgo navigating to websites altogether which means the way that we access information is going to significantly change. Web design might become entirely obsolete, websites that rely on traffic will suffer, and the only clue we have to optimizing in AI search environments is Generative Engine Optimization which I covered in a previous video.
Project Astra was intriguing. Siri and Alexa never managed to be useful assistants but the demo shown by Google makes me believe this next generation might actually work.
Advancement #3: MITRE to Build Federal AI Sandbox Supercomputer with Nvidia
MITRE is establishing the Federal AI Sandbox, a new supercomputing environment designed for AI testing and prototyping in sensitive government situations whiling significantly boost compute power. This initiative is expected to be operational by the end of the year, and will be powered by an NVIDIA DGX SuperPOD™. The Federal AI Sandbox will help federal agencies overcome barriers to AI adoption by providing a high-quality computing environment necessary for experimentation and prototyping in a secure environment.
The Federal AI Sandbox will support MITRE's work in various federal sectors, including national security, healthcare, transportation, and climate. Agencies can access these benefits through existing contracts with MITRE's federally funded research and development centers. The sandbox will enable the training of advanced AI applications, such as large language models and multimodal perception systems, and support reinforcement learning decision aids. NVIDIA's vice president of public sector, emphasized that the DGX SuperPOD will significantly enhance the federal government's ability to leverage AI, and offers exaFLOP-level performance, enabling large-scale training and deployment of custom AI solutions.
My Initial Thoughts: This is an ongoing trend that seems to only be increasing of these large companies directly collaborating with the government in a way that I’m sure happened before, but maybe not so publicly…? My biggest concern is the monopolization that is starting to occur, as these companies have had an increased trend of banning others from using their technology for government purposes.
Advancement #4: Speed Round
Universal Basic Compute on the All In Podcast
Sam Altman proposed a new concept called "universal basic compute" (UBC) to support those facing financial difficulties on this week’s All In Podcast. The idea is that everyone would get access to a portion of GPT-7's computing power, which they can use, resell, or donate for purposes like cancer research. Altman believes that owning a share of a large language model's productivity could be more valuable than money as AI integrates more deeply into everyday life, and should be a standard right for citizens to get access to.
Altman has been a long-time advocate for universal income, which is the idea of providing recurring cash payments to all adults regardless of their financial status or employment. In 2016, he initiated a basic-income experiment, involving the distribution of up to $1,000 monthly to over 3,000 participants for three years, with the results to be announced soon.
Mistal AI Funding Round
Mistral AI has officially closed its Series A funding round at $415 million dollars, putting the evaluation of the company at 2 billion.
Apple-OpenAI Deal Nears Finalization
Apple is finalizing a deal with OpenAI to power some Generative AI features — like a chatbot and possibly its voice assistant — for the new iOS this year. Talks with Google to integrate Gemini are still ongoing but no deal has been reached.
My Initial Thoughts: The universal basic compute concept scared me more than anything. While it may appear as a gift, it seems more like an opportunity for resource control regulation, but who really knows. As for the Apple/OpenAI deal, my uncertainty rises on whether Apple will continue to honor its strict “on-device” policy or switch to cloud computing. Will Apple be able to ensure data safety & privacy of Apple users from OpenAI? Unlikely…