Apple Rolls Out Ferret: A New Breed of Open Source AI, No Cage Required (Updates for Week of 1.22.24)
This week in AI: Recognition is circulating for Apple's quietly released multimodal LLM, RWS Group launches groundbreaking linguistic AI solution, a new AI copyright protection tool poisons models...
Advancement #1: News spreading of Apple's quietly released multimodal LLM, Ferret
Apple (in conjunction with researchers at Cornell) quietly released an open-source LLM on GitHub named "Ferret" back in October, which was largely overlooked until last week. While traditionally Apple is known for being quite secretive, its decision to make it open source (under non-commercial license only) shows a potential shift by Apple for more transparency than in the past.
Apple is known for its vision cognitive services, so it’s no surprise that Ferret is being recognized for its performance especially around user interaction with images in conjunction with NLP (Natural Language Processing). Ferret, trained on 8-A100 GPUs with 80GB memory, is reportedly better at recognizing and contextualizing smaller elements within a picture than many other models. For example, it allows users to draw a region on an image, identify relevant elements within that region, and then Ferret can even draw bounding boxes for queries.
Imagine this scenario: You point to an image of an animal and inquire with Ferret about its identity. Not only does Ferret accurately recognize the species, but it also comprehends if you're indicating a particular member of that species within a larger group.
If you want to read more about the technical aspects of Ferret, follow this link.
My Initial Thoughts: When I heard last week that Apple released a LLM named Ferret, I initially thought it was a joke. Seriously, who came up with that name?
Nonetheless, Apple has had AI for many years, mainly starting with their release of a weak AI, Siri (weak AI means its only trained for one singular task). I am happy to see that not only are they finally joining the public AI community, but maybe loosening up their traditional stance of being more secretive than the government programs I worked on (I should know… my brother worked for Apple on the Vision Pro team for four years, and I only knew exactly what he was working on at the release in Cupertino last June).
Putting aside the jokes, there is a good amount of chatter rising about Ferret but I am yet to personally see any implementations of it, so most of the examples I have are unfortunately from the internet. I have a friend that spent a good chunk of last week downloading the whole package, so time will tell if it compares.
Advancement #2: RWS launches the first major advanced linguistic AI solution for the translation industry
Last week RWS publicly launched "Evolve," an advanced linguistic AI solution. Evolve represents a significant advancement in the translation industry, blending human expertise with AI to dramatically improve efficiency and quality in translations for global enterprises with large-scale translation needs.
The solution went through a healthy beta program with Dell and other industry leaders, and was fine-tuned using client feedback and the integration of RWS’s own translation management system. Employing over 1,750 in-house language specialists and domain experts, RWS believes Evolve is a system that continuously learns and improves for optimal translation results.
Thomas Labarthe, President of Language & Content Technology at RWS, describes Evolve in the press release as a unique blend of human and AI intelligence, designed to help enterprises manage their extensive content translation needs.
Evolve utilized a number of tools in the RWS ecosystem, including Trados Enterprise (a translation management system), Language Weaver (neural machine translation technology), specialist-trained quality estimation, and an extensive private large language model.
My Initial Thoughts: I was so excited when I saw this news. Aside from the fact I'm a little biased (my startup Metalinguist has a close relationship with the Trados team and offers an integration with the Enterprise version), as well as in touch with their Director of Linguistic AI Solutions (who is one of the most beautiful souls I've ever met), I have first hand seen for over a year the amazing work of the RWS team. They approach language performance in an almost philosophical way: wanting to not just improve AI language translation performance but to truly capture the beautiful nuance of the human language.
While I have not personally seen Evolve in action, I cant wait to see some of the Metalinguist customers try it out with Trados Enterprise (two tandem solutions) as opposed to having to outsource to another third party AI solution.
Advancement #3: The copyright protection tool for artists that poisons AI models, Nightshade
Nightshade, released this week designed by computer scientists at the University of Chicago, creates the first opportunity for artists to protect their work from un-authorized use when included in an AI model’s learning process.
Described as an “offensive” tool, Nightshade essentially poisons models and turns AI against itself when data scrapers use artwork without consent. Its primary mission is to penalize scrapers who have disregarded copyrights or opt-out lists without repercussions. When a model ingests a significant number of poisoned images, it will alter the precision of its output. Because the goal with all AI models is to fine-tune their performance, affecting that output has significant consequences.
How does it work? Nightshade leverages PyTorch (a popular open-source machine learning framework) to analyze images and modifies them at the pixel level, making it challenging for other AI programs to recognize the original content. Ultimately, it applies a tag that subtly alters the image so that AI programs see something completely different than what’s there. For example, if the original image was a cow in a field, once the image is cloaked the nightshade, the model could see something like a leather purse in a field.
The image below was used as an example from the paper published by the Nightshade team. It shows an example of how poisoning a concept (dog) also impedes the generation of related concepts (puppy, husky, wolf).
The tool has received an incredible amount of downloads (250,000) in just 5 days. While traditionally most legal protection is hidden in the metadata of an image or by having the original file, Nightshade is a step forward, and still protects the artist even if scrapers try to circumvent the poison by image alteration.
“You can crop it, resample it, compress it, smooth out pixels, or add noise, and the effects of the poison will remain. You can take screenshots, or even photos of an image displayed on a monitor, and the shade effects remain. Again, this is because it is not a watermark or hidden message (steganography)."
If you want to read more about the different equations and its actual statistical impact on models, you can read it here.
My Initial Thoughts: The copyright infringement claim regarding AI model training protocols is a complex legal argument that has still not been confirmed in one direction or the other by any major country of the world. Because of this, I’m going to play devil’s advocate here a little bit.
Before software development, my background was in the realm of music copyright and infringement. I am that friend that never illegally downloaded a movie or song and am as straight as they come in terms of illegal usage. Working with artistic copyright like I did (at no less than Sony Music), I saw many lawsuits regarding infringement come across my department’s desk.
Like LLMs use the same words in an infinite amount of combinations, or other foundation models (like image generation) that utilizes similar visuals, songs share the same musical notes. In music, it typically became copyright infringement when you have the exact same 5-7 notes in a row. So, for example, an artistic copyright infringement claim typically went something like this:
Since GenAI itself is a statistical model, it is very unlikely that it can produce the same word for word like the original text to be categorized as a clear case of copyright infringement. Just like the human learning process with textbooks, the LLMs (or image foundation models) absorb information and reiterate it in their own way. Should this also be considered copyright infringement when humans do the same?
While there is no formal ruling at this point regarding copyright and AI, I believe this music copyright example could possibly be utilized when making copyright legislation for AI in the future. What I can agree is a clear example of infringement: explicitly utilizing a person’s likeness or style. Prompting an image model to “make me a painting in the style of Monet” should not be allowed. I am seeing solutions in place to possibly generate royalties in this case on key words.
One caveat: the example I just provided to you was whether the final generation of the product was considered copyright infringement. In the case of Nightshade, they are also protecting against un-authorized use, which I am even more unsure about. The data scrapers will tell you it’s protected under “fair use,” laws, the legal doctrine in the U.S. that states prior work may be used in a new work if it is transformed and used for a new purpose. Only time will tell.
HIDDEN HEADLINE
Advancement #4: IBM announces retirement of its neural machine translation service, Watson Language Translators
IBM announced that Watson Language Translator service will be completely discontinued and will not be able to be used by any IBM cloud customers at the end of December 2024.
IBM Watson Language Translator is a service that identifies languages in text and translates it into other languages programmatically. Supporting 58 different languages through their API, users can currently utilize the provided translation models or create their custom models. Additionally, the service can translate documents, including Microsoft Office files, PDFs, and other supported formats, while preserving the original formatting.
Currently there are three major TMS, including Smartling and CrowdIn, that support the integration.
My Initial Thoughts: IBM gave no reasoning for their retirement of their machine translation service, but I am sure that the recent spike in LLM type models for translation has something to do with it. This is a major retirement in the translation sphere.