Google Launches Gemini Omni Flash Video Generation Model

Google has introduced Gemini Omni Flash, a multimodal AI model for generating and editing videos from text, images, audio, and video inputs. The launch expands Gemini into AI-native video creation and conversational editing workflows.

By Daniel Mercer Edited by Maria Konash Published:

Google has unveiled Gemini Omni, a new family of multimodal generative AI models designed for video creation and editing, launching the first version called Gemini Omni Flash.

The company said Omni combines Gemini’s reasoning capabilities with native multimodal generation, allowing users to create and edit videos using combinations of text, images, video clips, and audio references. While the current release focuses primarily on video output, Google said image and audio generation capabilities will be added later.

Gemini Omni Flash is rolling out globally through the Gemini app, Google Flow, and YouTube Shorts. The model is currently available to Google AI Plus, Pro, and Ultra subscribers, while YouTube Shorts creators will gain free access starting this week. API access for developers and enterprise customers is expected in the coming weeks.

Google described Omni as a system capable of conversational video editing, where users can iteratively refine scenes, motion, characters, and visual styles through natural language prompts without restarting projects from scratch.

The company demonstrated examples involving scene transformations, realistic physics simulations, animated explainers, cinematic effects, and stylized edits generated from mixed media inputs.

Gemini Expands Beyond Text and Images

The launch marks Google’s most ambitious push yet into AI-native video generation as competition intensifies across multimodal AI platforms.

Unlike earlier AI video systems focused mainly on text-to-video generation, Omni is designed to combine multiple input formats simultaneously. Users can provide reference images, existing video footage, soundtracks, voice clips, or text prompts to guide the generated output.

Google said the model uses Gemini’s broader world knowledge and reasoning systems to generate scenes that better reflect physical behavior, historical context, and semantic relationships rather than relying purely on pattern matching.

The company highlighted improvements in simulated physics, including gravity, fluid motion, and object interactions. It also showcased educational and explanatory videos generated from short prompts, including claymation-style scientific explainers and fast-paced animated sequences synchronized to music.

Omni additionally supports conversational editing workflows where each user instruction builds incrementally on previous edits while preserving scene continuity and visual consistency.

AI Video Competition Accelerates

Google is also integrating Omni across multiple consumer creator platforms rather than limiting it to standalone developer tools. The rollout to YouTube Shorts signals the company’s broader strategy of embedding generative AI directly into creator ecosystems at scale.

The company said Omni includes SynthID watermarking technology to identify AI-generated content and is initially limiting certain features involving voice and identity manipulation while it evaluates safety and abuse risks.

Google additionally introduced Avatar support, allowing users to generate videos featuring AI-generated versions of themselves using their own voice. The company said broader speech-editing and voice modification features are still undergoing testing before wider release.

The launch follows Google’s recent introduction of Gemini 3.5 family, including Gemini 3.5 Flash – its new AI model optimized for coding and agentic workflows.

AI & Machine Learning, News
Exit mobile version