The Most Recent AI Legal Updates Around the Globe in 4 Minutes
SearchGPT is a bust, understanding Llama 3.1 405B, and a speed round of all the AI legislation/legal updates around the globe (including EU AI Act, the US AI Innovation Act, and data privacy laws).
Watch the Video Debrief:
Well, I was being held hostage.
Well, guys. I have to start off with an apology for why I didn’t write anything the last few weeks, but the truth is… I was being held hostage… by the work monster.
I am not kidding - my phone tells me I have averaged 4.5 hours of sleep per night for the last month.
My startup (Metalinguist) seemed to have found its market fit recently almost overnight in the language industry after the launch of our MVP interpreting module. Initially, we offered just a translation client portal, but the addition of the interpreting side has provided an all-in-one solution that LSPs and in-house teams find incredibly valuable. My time has been stretched thin (to say the least).
While I was gone, I’ve amassed to almost 1k followers on Youtube, and I’m here to catch you up on the last couple of weeks. Don’t worry - I’ve got your back. :-)
Advancement #1: OpenAI Launches SearchGPT
OpenAI dropped a new browser extension called SearchGPT, set to enhance how you surf the web, adding a layer of AI-powered assistance to your searches.
When you search, the SearchGPT extension pops up a response alongside your regular search results, then allows you to jump into a normal prompting session right after you get your answer. SearchGPT is designed to blend seamlessly with your browsing experience, adding a layer of AI-powered assistance to your searches. On top of that, if you’re using the Chrome extension, you also can quickly access SearchGPT at any time through a handy popup window, instead of navigating to the ChatGPT website.
However, there's a catch. For many users (including myself) the plugin isn't functional yet and raises significant concerns about data privacy.
My Initial Thoughts on SearchGPT: A Complete Bust
Let’s talk about the really scary part: data privacy. When you install SearchGPT, you’re giving it permission to see and write all your data across every website you visit. Think about that for a second. What happens if you navigate to a secure app with sensitive information? The implications are horrifying. This is a major red flag and something everyone should be wary of before using this extension.
OpenAI's release of SearchGPT was supposed to be groundbreaking, but it turned out to be a complete bust. This launch perfectly illustrates that even big companies can fall prey to bugs. Seriously, if you're going to release something, at least make sure it’s a decent MVP. Minimum requirements = it can load.
Advancement #2: Understanding Meta’s Llama 3.1 405B
This is the main update we missed on my off week. Meta released Llama 3.1 405B, a beast of an AI model with a whopping 405 billion parameters, making it arguably the largest open sourced model available to date. Side note: if you don’t know what a parameter is, typically the more parameters, the more it can do and the better performance).
Llama 3.1 was trained on over 15 trillion tokens (which is the equivalent to 750 billion words!), and while they built on the base training set used for earlier models, they claim they have significantly refined their data curation pipelines and adopted more rigorous quality assurance and data filtering methods for this release.
Two really interesting aspects of their training process for Llama 3.1:
They opted for a decoder-only transformer model architecture with minor adaptations, instead of a mixture-of-experts model, claiming this maximized training stability. Without going into the technical, mixture of experts recently has been seen as the standard to maybe move towards, and is used by Google in a lot of their models, but Meta decided to go with a more traditional architecture.
They also utilized synthetic data to produce the vast majority of the supervised fine-tuning (SFT) examples. For those unfamiliar with SFT, it involves taking the model's output and comparing it to a predefined data set of what the expected output should be, virtually labeling the correct value. This process helps increase accuracy, but simultaneously creating millions of these preset output datasets would be extremely labor-intensive, so when training AI systems usually they leverage a method called semi-supervised learning, which uses a combination of labeled and unlabeled data to improve the model’s performance efficiently.
Meta researchers noted that Llama 3.1 405B was trained with more non-English data than earlier Llama models to enhance its performance in non-English languages, has an expanded range of mathematical data and code to enhance its reasoning skills, and contains up-to-date web data to improve its grasp of current events.
My Initial Thoughts:
The training of the data for these companies is the top secret sauce that is kept behind closed doors but also determines how effectively AI models perform. It shapes their abilities, biases, and knowledge, making it a crucial component of model development.
A significant ethical debate currently surrounds the use of training data, particularly the choice between synthetic and copyrighted sources. Many major AI vendors, such as OpenAI and Anthropic, are actively exploring synthetic data to scale their AI training, but there are some big concerns. Because they’re generated from an algorithm rather than collected from real-world sources, the fear is it may amplify existing biases and fail to accurately represent the complexities of real-world data.
On the other hand, using copyrighted data poses its own set of ethical and legal challenges. A recent report from Reuters revealed that Meta has used copyrighted books for AI training despite warnings from its legal team. The company has also trained its models on private user data from Instagram and Facebook posts, including photos and captions, often without providing users with a straightforward option to opt out. This controversial approach has led to ongoing lawsuits accusing most all of the AI companies of unauthorized use of copyrighted content.
In all cases, this is a complex ongoing debate, and finding a responsible approach to training data remains a pressing challenge.
Advancement #3: Speed Round of AI Legal Updates (Mainly)
EU AI Act
The EU’s AI Act is now in full force. The act categorizes AI apps into various risk tiers, imposing specific obligations and penalties for non-compliance. Developers of high-risk AI, including biometric and facial recognition systems, face stringent requirements, while general purpose AI (GPAI) developers must adhere to transparency and copyright rules. Those detailed compliance measures still under discussion and expected by April 2025.
USA Congressional Bills Recently Endorsed by OpenAI
Open AI endorsed three congressional AI bills this week, including the Future of AI Innovation Act, which would would formally authorize the USAI Safety Institute as a federal body that sets standards and guidelines for AI models. The two other bills endorsed, NSF AI Education Act and the CREATE AI Act, hopes to provide federal scholarships for AI research and establish AI educational resources.
More Countries Protect Users from Meta’s Data Usage
Brazil is the latest country to join many others by prohibiting Meta to train on user data, "imminent risk of serious harm and irreparable damage to fundamental rights.” Meta has immediately suspended its AI assistant following the ban.
X (Twitter) Data Privacy Concerns
A privacy watchdog expressed surprise at Elon Musk opting to utilize user data for Grok AI training, given his strong criticism of OpenAI, Meta, and others for using similar methods.
Apple Intelligence Beta Available
And lastly on this speed round, Apple Intelligence officially beta launched and is available for app developers, but as exciting as it is, be warned: beta launches are notoriously buggy.
My Initial Thoughts:
OpenAI’s support for these bills reflects its belief in the government's role in ensuring AI safety and accessibility, while also positioning itself favorably for future regulatory discussions. It’s interesting to note that big companies like OpenAI are often "pro" regulation. Why? Because by supporting these regulations, they can shape the rules in a way that benefits them and makes it harder for smaller competitors to keep up. This strategic move helps them maintain their dominant position in the market.