Many-Shot Jailbreaking in LLMs and Apple's ReaLM

This week in AI: New Research By Anthropic Unveils Many-Shot Jailbreaking. Apple's ReaLM, Google VLOGGER, US-UK AI Agreement, and You Can Now Edit Dall-E Images.

Apr 04, 2024

Advancement #1: Unveiling Many-Shot Jailbreaking in LLMs by Anthropic

Anthropic on Tuesday published an interesting (but somewhat concerning) study highlighting the increased vulnerability of large language models (LLMs) to "Many-Shot Jailbreaking"—a technique that involves tricking a LLM to provide potentially harmful responses despite its safety training.

Many-shot jailbreaking targets a specific vulnerability in a LLM related to their context window. The context window is the maximum amount of information or text an LLM can consider at once during its processing. As of early 2023, this capacity had significantly expanded from being able to process the equivalent of a long essay (~4,000 tokens) to handling the content comparable to several long novels (over 1,000,000 tokens). This expansion allows LLMs to process and integrate much larger amounts of information for a single query, but introduces vulnerabilities, such as the potential for "jailbreaking" the model's safety protocols.

Many-shot jailbreaking incorporates a vast number of faux dialogues (up to 256 tested) before a final, potentially dangerous query, can override the model's safety training and produce harmful responses. Mitigation efforts suggestions include limiting the context window's size and refining the model's ability to detect and reject jailbreaking attempts were given as solutions.

However, these strategies present challenges, such as potentially reducing the model's usefulness or merely delaying the inevitable production of harmful outputs. The paper reports some success with prompt-based interventions, which preprocess inputs to reduce the effectiveness of many-shot jailbreaking, significantly lowering the success rate of such attacks in some tests.

My Initial Thoughts: While Anthropic’s research does not currently view LLMs as a source of extreme danger, models developed in the future could potentially carry such risks. By sharing research on many-shot jailbreaking, this will help motivate developers and the wider academic community to explore ways to safeguard against this specific breach and other vulnerabilities linked to the expanded context window of LLMs as a community-based effort.

Advancement #2: Apple's ReALM Enables Siri to Understand and React to the World Around It

Apple introduced ReALM, a compact language model aimed at enhancing voice assistants like Siri by enabling them to recall past conversations, comprehend on-screen content, and recognize environmental sounds, such as music playing in the background. ReALM stands out for its small size, designed not to replace, but to add context to other AI models.

ReALM works by converting the visual content on a phone's screen into a text-based representation, identifying and labeling on-screen entities and their locations. This process allows it to provide contextual clues to voice assistants for user requests, improving performance by bypassing the need for advanced image recognition.

For example, if a user wants to call a business they're looking at on a website, ReALM enables Siri to understand the context, conver the image’s contents to text, and execute the call by “seeing” the phone number labeled on the business's contact on the page. This method makes ReALM smaller and more efficient compared to models like GPT-4, which rely more heavily on natural image parsing and advanced OCR.

The integration of ReALM could significantly enhance Siri's capabilities, enabling it to better interpret user commands related to apps and content on the screen, and facilitate more conversational interactions without the need to deploy a LLM like Gemini. This research paper comes ahead of the anticipated launch of iOS 18 at WWDC 2024 in June, though Apple has not confirmed it will actually be part of the iOS yet.

My Initial Thoughts: Apple has been releasing a lot of research lately as they approach a public launch of its AI initiatives in June’s WWDC, mainly concentrated thus far on developing methods to operate AI models directly on devices.

A few weeks back, I noted during our discussions on the Google Gemini partnership that only Google Nano met Apple's privacy standards, suggesting Apple might need to rely on cloud processing. However, Apple's recent focus on on-device research indicates a stronger commitment to their privacy guidelines, rather than shifting towards cloud reliance.

Overall, while REALM is still impressive for the amount it can accomplish with its tiny size, I am a bit underwhelmed by Apple’s advancement in the space.

Hidden Headline

Advancement #3: Google’s Reveals the “VLOGGER” Project Focused on AI Driven Video Animation

Demo videos of Google's VLOGGER started to circulate this week, a research project aiming to animate static photos into dynamic, photorealistic videos using AI. Based on advanced generative diffusion models, VLOGGER leverages text and audio inputs to produce talking human videos from a single photo.

While still in the research phase (mainly showcased through entertaining mockup videos), its potential as a product could dramatically change the way we communicate on digital video platforms such as Teams or Slack. Current competitors in the space are Hey Gen, Synthesia, and Pika Lab.

My Initial Thoughts: I want to dive into some ethnical questions of Google's VLOGGER project (and similar image to video projects using a person’s likeness).

The first concerns consent. The use of an individual's likeness to create videos or a dynamic avatar raises questions about the right to control one's digital representation. I have brought this argument up previously in a question about copyright regulation, and how someone’s likeness should be exclusively used by them if no prior authorization was obtained.

There's also the potential for misuse to create deepfakes, contributing to misinformed or harming individuals’ reputations. Lastly, privacy implications cannot be overlooked, as the collection and processing of personal images involve handling sensitive personal data and biometrics, which could possibly be used for training or use in the future.

The demo videos of VLOGGER were a bit underwhelming to me, but as the creation of these tools and their use will inevitably come in the future, we need to start setting clear ethical guidelines and regulatory measures now to safeguard individual human rights.

Advancement #4: US and UK Sign the First Bilateral AI Collaboration Agreement

On April 1, the United States and the United Kingdom signed a landmark agreement to collaborate on developing tests for cutting-edge artificial intelligence (AI) models. This agreement, effective immediately, involves sharing critical data on the capabilities and risks of AI systems, exchanging technical research on AI safety and security, and aligning their strategies for the safe deployment of AI technologies.

This partnership aims to establish safeguards amid the rapid growth of AI systems, which, despite offering significant opportunities, also pose threats to societal structures, including the spread of misinformation and the integrity of elections. Both nations will work together to refine their scientific methods and expedite the development of comprehensive evaluations for AI models, systems, and agents. The collaboration plans to create a unified approach to AI safety and conduct joint test exercises on publicly accessible models.

This initiative represents a proactive effort by both governments to address the fast-evolving risks associated with AI and promotes international cooperation on AI safety. It is the first bilateral agreement of its kind.

My Initial Thoughts: If it wasn’t clear with Anthropic’s release of the multi-shot jailbreaking research, AI safety and even performance enhancement will require collaboration. It is a shift from a mindset we’ve had for a very long time of trying to safeguard trade secrets from each other. The UK and USA’s agreement to collaborate together further emphasizes this message that two is better than one.

Advancement #5: You Can Now Edit Generated Images with Dall-E

The DALL·E editor allows for image modifications through selecting parts of an image for edits or directly describing desired changes via chat or a conversation panel, accessible by clicking on a DALL·E-generated image.

My Initial Thoughts: I don’t have much to expand on but this is a game changer for me in terms of image editing and generation. You were previously unable to change certain parts of the photo and required full re-generation. Now, you can change smaller components without altering the whole image.

Watch the debrief

title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>