TikTok's Symphony AI Suite (Photo-Realistic Digital Avatars and Translation)

Also this week in AI: Meta Stopped from Training on EU/UK User Data, Google Deepmind Launches V2A Technology (+ How it Works), and Runway Launches Gen-3 Alpha.

Jun 21, 2024

As always, watch the video. I totally did not do a TikTok dance to get in the spirit. I repeat, I do… not…

#1: TikTok Launches Incredible Symphony AI Suite

TikTok has unveiled GenAI human realistic digital avatar for content creators, built on their newly announced Symphony Generative AI suite.

The digital avatars will come in two types: custom avatars or stock avatars. Custom avatars are designed to resemble a specific creator or brand spokesperson and can speak multiple languages, enabling accounts to reach foreign audiences while maintaining a consistent likeness. Stock avatars, on the other hand, are created using paid actors from various backgrounds, nationalities, and languages, allowing a business to add a human touch to their content.

There is a precedent for creators striving to retain their identities while entering new language markets, and TikTok now enables this by providing multi-language support through a feature called Symphony AI Dubbing. This global translation tool automatically detects the language in a video, transcribes, and translates the content to create a dubbed video in the preferred language, allowing creators and brands to communicate with global audiences more effectively.

The new TikTok symphony suite also launched a creative AI assistant that helps users to brainstorm, identify trends, and come up with new content throughout their creative journey. It also includes a video generation tool that can create short clips from an existing assets or product URLs.

This launch follows TikTok's recent revelation that 61% of users have made a purchase either directly on TikTok or after seeing an ad.

My Initial Thoughts: This latest development from TikTok may raise concerns for some, and I acknowledge these concerns for several reasons, including the challenge of distinguishing between AI-generated and real content.

If you all remember the Hollywood actor strike last year highlighting various concerns about AI, particularly the fear of unauthorized use of their likenesses, it’s related here. TikTok's approach with its new AI avatars actually acknowledges that concern and gives creators control over the use of their likenesses. Creators can set their rates, licensing terms, and permissions for avatar use, providing a deeper level of control and protection in this evolving landscape of AI-generated content.

What is particularly interesting to me though, is that it’s really unclear what model this was built on and my gut feeling is this may be a proprietary model to TikTok, so, I don’t have a TikTok, but all your awesome dances might be fueling their Symphony model. Yay…?

#2: Meta Stopped from Training its AI Systems with EU and UK User Data

Meta has announced that it will pause its plans to train its AI systems using data from its users in the European Union and the U.K. This decision comes after pushback from regulatory bodies, including the Irish Data Protection Commission and the U.K.'s Information Commissioner's Office.

While Meta already uses user-generated content to train its AI in other markets, like the USA :-(, the stringent GDPR regulations in Europe have created obstacles for the company.

Meta notified users a few months ago of an upcoming change to its privacy policy that would allow it to use public content on Facebook and Instagram to train its AI. This would includes content from comments, interactions with companies, status updates, photos, and their associated captions. Meta argued that usage of this data was necessary to reflect the diversity of languages, geography, and cultural references of European users.

Meta has been relying on a GDPR provision called "legitimate interests" to argue that its actions are compliant with regulations. However, the company has faced criticism for making it difficult for users to opt out of having their data used. While Meta claimed to have sent out more than 2 billion notifications about the upcoming changes, these notifications were listed among common notifications such as birthdays and general activity, making them easy to miss. Users were also not able to directly opt out but had to complete an objection form requesting it, ultimately leaving the decision to Meta's discretion.

Meta's global engagement director for privacy policy, Stefano Fratta, expressed disappointment with the situation, stating that "This is a step backwards for European innovation, competition in AI development, and further delays in bringing the benefits of AI to people in Europe."

My Initial Thoughts: If Fratta thinks this is a step backward, I'm going to say it's a step forward. Two good weeks in a row for privacy. I am so happy to see regulators standing up for user data rights, even if according to Meta, it means a bit of a delay in AI development. Maybe this is the start of a trend where our privacy and data security come first, and AI advancement follows suit (as it should be!).

#3: Google Deepmind Announces Video-2-Audio (V2A) Technology

There are a lot of video generation models, but most can only produce videos without sound. An essential development with AI will be the creation of soundtracks for these silent videos, and Google DeepMind announced this week significant progress in this area with their video-to-audio (V2A) technology.

Enabling synchronized audiovisual generation, their V2A uses video pixels and natural language text prompts to create immersive soundscapes for on-screen action, and can be paired with video generation models like Veo. V2A expands creative possibilities by generating soundtracks for a variety of different footage types, including adding dramatic music scores, sound effects, dialogue between characters, and even on existing archived material like silent films.

However, given that the audio output quality relies on the video input quality, artifacts or distortions in the video can lead to significant performance issues. Google is working on addressing this and improving lip synchronization technology for future releases.

How it works: Google experimented with autoregressive and diffusion approaches to discover the most scalable AI architecture. The diffusion-based approach for audio generation proved to be the most realistic and compelling for synchronizing video and audio information.

The V2A system begins by encoding video input into a compressed representation. Then, the diffusion model iteratively refines the audio from random noise, guided by the visual input and natural language prompts. This process generates synchronized, realistic audio closely aligned with the prompt. Finally, the audio output is decoded, converted into an audio waveform, and combined with the video data.

To enhance audio quality and enable guidance for specific sounds, additional information was included in the training process. This included AI-generated annotations with detailed descriptions of sound and transcripts of spoken dialogue. Training on video, audio, and these additional annotations allows the technology to associate specific audio events with various visual scenes, responding to the provided information in the annotations or transcripts.

My Initial Thoughts: I think this is update is pretty cut and dry, so I don’t have much to say about it.. Similar to Runway’s announcement, this is a critical area of improvement for being able to produce full fledged films, so it’s good to be aware of the current challenges we are dealing with to accomplish that.

#4: Runway Launches Gen-3 Alpha Video Model

Runway introduced Gen-3 Alpha on Monday, its latest AI model that can create video clips from text descriptions and still images.Their new model provides advanced control over video structure, style, and motion, while excelling at creating human characters with a wide range of actions, gestures, and emotions. The limitations include a maximum video length of 10 seconds, difficulty handling complex interactions between objects, and generating footage that doesn't always adhere to the laws of physics.

Runway didn't specify where the training data for Gen-3 Alpha came from, only mentioning that their in-house research team uses curated internal datasets. Runway somewhat addressed the copyright issue, stating that they consulted with artists while developing the model, although they did not specify who they actually consulted.

Runway faces competition from several companies. Adobe is developing a video-generating model using its Adobe Stock media library, OpenAI's Sora is being tested with marketing agencies and film directors, and Google’s model Veo is being trialed by select creators. Runway's Gen-3 will be available soon to subscribers, including enterprise customers and creative partners.

My Initial Thoughts: Video generation tools like Runway 3-Alpha are expected to significantly impact the filmmaking industry. A 2024 study by the Animation Guild revealed that 75% of film production companies using AI have reduced or eliminated jobs. The study estimates that by 2026, over 100,000 U.S. entertainment jobs will be disrupted by generative AI.

Regarding copyright, remember my usual stance: whether AI-generated content (and its training data) qualifies as copyright infringement or is protected under fair use in the U.S. remains uncertain. However, creating AI content that imitates someone's likeness or style is increasingly being recognized as clear cases of copyright infringement, for which you as the user could be legally liable (so no asking these video generators to create a film like James Cameron).