Google Prepares New Gemini Omni AI Video Generation Model

Google appears to be preparing a new AI video generation model called Gemini Omni inside Gemini. Early tests show improved text rendering, conversational editing, and more realistic generated scenes.

By Daniel Mercer Edited by Maria Konash Published:
Google prepares Gemini Omni video model with conversational editing and improved scene realism inside Gemini. Image: BoliviaInteligente / Unsplash

Google appears to be preparing a new AI video generation model called Gemini Omni, according to early user reports and interface screenshots shared online. The feature surfaced inside Gemini with prompts inviting users to “Create with Gemini Omni,” suggesting Google may unveil the system more broadly at Google I/O 2026.

Google reportedly describes Omni as a “new video generation model” that supports video remixing, conversational editing, templates, and direct scene generation through chat prompts. While the company has not officially announced the model, metadata reportedly suggests Omni is connected to Google’s existing Veo video generation technology.

Early demonstrations indicate the system focuses on improving realism and consistency in generated video. One test generated a scene of a professor explaining trigonometric equations on a chalkboard while maintaining relatively coherent mathematical notation throughout the sequence. Text rendering remains one of the more difficult challenges for AI video systems because letters and equations often distort across moving frames.

Another example recreated the widely used “spaghetti test” benchmark, which many AI developers informally use to evaluate hand movement, object interaction, and eating realism in generated video. The generated clip showed two men seated at an outdoor restaurant eating spaghetti and holding a natural conversation, with fewer visual inconsistencies than earlier-generation AI video models.

The leaked interface also included a dedicated usage tracker for video generation. One tester said two complex prompts consumed roughly 86% of a daily AI Pro usage allowance, indicating that video generation workloads may remain heavily restricted because of their high compute requirements.

Google Pushes Gemini Deeper Into Video Creation

The apparent Omni integration suggests Google is moving video generation directly into Gemini instead of keeping Veo as a separate experimental product. The addition of conversational editing tools points toward a workflow where users can iteratively modify videos through chat rather than repeatedly generating clips from scratch.

That approach could make Gemini more competitive as an end-to-end creative platform combining text, image, audio, and video generation in a single interface. The reported template support also suggests Google may target marketing, education, and content production workflows rather than only experimental consumer use cases.

The stronger handling of written text and scene continuity is particularly notable because those remain major weaknesses across many current AI video systems.

Video AI Competition Accelerates Ahead Of I/O

The leaks arrive as AI companies increasingly compete in generative video infrastructure and creative tools. Video generation has become one of the fastest-growing areas of multimodal AI, though it also remains one of the most computationally expensive.

Google has continued investing heavily in video generation technology through Veo while expanding Gemini into a broader multimodal platform. The timing of the Omni leak, shortly before Google I/O 2026, suggests the company may be preparing a larger announcement around integrated AI media creation.

If launched publicly, Gemini Omni would place Google in more direct competition with other AI video platforms targeting professional content generation, conversational editing, and multimodal creative workflows.

Thinking Machines Introduces AI Models for Live Multimodal Collaboration

Thinking Machines Labs introduced a research preview of “interaction models” designed for continuous real-time collaboration across audio, video, and text. The system combines live multimodal interaction with asynchronous reasoning and tool use.

By Daniel Mercer Edited by Maria Konash Published:

Thinking Machines Labs introduced a research preview of what it calls “interaction models,” a new class of AI systems designed to collaborate with users continuously across audio, video, and text rather than through traditional turn-based prompts.

The company said the models are trained from scratch to support real-time interaction, allowing users and AI systems to speak, interrupt, observe, respond, and work simultaneously. The architecture is built around “micro-turns” that process roughly 200 milliseconds of input and output at a time, enabling continuous two-way interaction instead of waiting for users to finish speaking or typing before responding.

According to Thinking Machines, the system combines a real-time interaction model with a separate asynchronous background model responsible for longer reasoning tasks, tool use, browsing, and workflow execution. The interaction layer remains active throughout the process while integrating results from the background model as they arrive.

The company argued that current AI systems create a “collaboration bottleneck” because most models operate through rigid turn-taking interfaces that limit human involvement during reasoning and execution. Thinking Machines said its approach aims to make AI collaboration function more like natural human conversation.

The research preview demonstrates several capabilities that are difficult to achieve in standard voice assistants or multimodal chat systems. These include simultaneous speech between user and model, proactive verbal and visual interjections, continuous visual monitoring, real-time translation, concurrent tool use during conversations, and direct awareness of elapsed time.

For example, the company showed scenarios where the model corrected spoken language mistakes while users continued speaking, counted physical exercises through live video streams, reacted to coding errors as they appeared onscreen, and performed live multilingual translation without pausing conversations.

Interaction Becomes A Core AI Capability

The announcement reflects a broader shift in AI development toward systems optimized for continuous collaboration rather than isolated prompt-response exchanges.

Most current real-time AI products rely on external orchestration layers such as voice activity detection systems and separate dialogue managers to simulate interactivity. Thinking Machines argues those approaches create limitations because the intelligence governing interruptions, timing, and conversational flow exists outside the model itself.

Instead, the company embedded interaction directly into model training and architecture. That allows responsiveness, interruption handling, simultaneous speaking, and multimodal awareness to improve alongside overall model capability as systems scale.

The architecture also differs from many multimodal systems by minimizing reliance on large standalone audio or video encoders. Audio, video, and text are processed together through shared transformer infrastructure using lightweight embedding layers and early fusion techniques.

Benchmarks Highlight Speed And Responsiveness

Thinking Machines said its TML-Interaction-Small model achieved stronger combined responsiveness and interaction quality than several existing commercial realtime AI systems across internal and public benchmarks.

The company highlighted improvements in latency, interruption handling, simultaneous conversation, proactive responses, and continuous multimodal awareness. Internal evaluations also tested capabilities that many current voice models cannot reliably perform, including reacting to visual changes without explicit prompts and speaking concurrently with users during live tasks.

The released model is currently a 276-billion-parameter mixture-of-experts system with 12 billion active parameters at runtime. Thinking Machines said larger interaction models are already pretrained but remain too computationally expensive for low-latency deployment today.

The company added that future work will focus on longer session memory management, infrastructure optimization, safety research for realtime multimodal interaction, and deeper coordination between interactive and background reasoning systems.

The announcement also follows a recently expanded partnership between NVIDIA and Thinking Machines Labs to deploy next-generation Vera Rubin AI systems for frontier model training.

AI & Machine Learning, News

OpenAI Co-Founder Says Sam Altman Showed ‘Pattern of Lying’

Former OpenAI chief scientist Ilya Sutskever testified that he spent about a year collecting evidence that Sam Altman displayed a “consistent pattern of lying.” The testimony came during the ongoing OpenAI and Elon Musk trial in California.

By Samantha Reed Edited by Maria Konash Published:
Ilya Sutskever says Sam Altman showed a “consistent pattern of lying” during the OpenAI leadership dispute and Musk trial. Image: Wesley Tingey / Unsplash

Ilya Sutskever testified in court that he spent roughly a year gathering evidence that Sam Altman displayed a “consistent pattern of lying” before voting to remove him as OpenAI CEO in November 2023.

The testimony came during the third week of the high-profile legal battle between Elon Musk and OpenAI in California federal court. Sutskever confirmed that he had been considering action against Altman for at least a year prior to the board’s decision to temporarily oust him.

According to Sutskever, OpenAI’s board asked him to prepare a document detailing concerns about Altman’s conduct. He testified that the material eventually reached 52 pages and included examples of dishonesty as well as behavior that allegedly involved “undermining and pitting executives against one another.”

Sutskever said he had discussed the possibility of removing Altman with former OpenAI chief technology officer Mira Murati after the two spoke extensively about Altman’s leadership style and internal management.

“His conduct was not conducive to any grand goal,” Sutskever said in court, referring specifically to OpenAI’s mission around safe artificial general intelligence.

Sutskever played a central role in Altman’s brief removal from OpenAI in 2023 while serving on the board. However, he later reversed course and supported Altman’s reinstatement after concerns emerged that the company could fracture or collapse during the leadership crisis.

The testimony also revealed new details about OpenAI’s internal turmoil during that period. Sutskever confirmed that remaining board members discussed a potential merger with rival AI company Anthropic after Altman’s removal. Under the proposal, Anthropic leadership would reportedly have taken control of OpenAI. Sutskever said he was “not excited” about the idea.

He additionally disclosed that his personal stake in OpenAI was valued at approximately $5 billion in November 2025 and around $7 billion currently.

Trial Exposes Internal OpenAI Power Struggles

The testimony provides the clearest public account so far of the internal breakdown that led to Altman’s temporary firing and rapid reinstatement. While the board initially cited communication concerns at the time, Sutskever’s statements suggest the conflict involved longer-running disputes over management style, executive relationships, and governance.

The case has also exposed tensions between OpenAI’s nonprofit governance structure and the enormous commercial value generated by its AI business. OpenAI has raised tens of billions of dollars in investment while simultaneously operating under a nonprofit-controlled structure originally designed to prioritize AI safety and public benefit.

Musk, who co-founded OpenAI before leaving in 2018, argues the company abandoned those principles as it evolved into a highly commercial AI organization closely aligned with Microsoft.

OpenAI Leadership And Governance Face Renewed Scrutiny

The trial has become one of the most consequential legal disputes in the AI industry because it could reshape OpenAI’s governance, ownership structure, and leadership.

Musk is seeking $150 billion in damages to be directed to OpenAI’s nonprofit entity and has asked the court to remove Altman and OpenAI president Greg Brockman from leadership roles.

Earlier in the proceedings, Microsoft CEO Satya Nadella described Microsoft’s investment in OpenAI as a “calculated risk,” emphasizing that the partnership delivered major strategic and marketing advantages.

Sutskever, who left OpenAI in 2024 and later founded Safe Superintelligence, is expected to remain a key figure in the case as the court examines whether OpenAI’s transformation into a commercial AI powerhouse violated commitments made during its founding.

AI & Machine Learning, News, Regulation & Policy

OpenAI Launches $4 Billion Enterprise AI Deployment Venture

OpenAI is creating a new enterprise deployment company backed by more than $4 billion in initial funding and acquiring AI consulting firm Tomoro.

By Maria Konash Published:
OpenAI forms a $4B AI deployment firm and acquires Tomoro to expand enterprise implementation services. Image: Levart_Photographer / Unsplash

OpenAI is creating a new enterprise-focused business called OpenAI Deployment Company with more than $4 billion in initial committed investment. The company also announced the acquisition of Tomoro, a consulting firm specializing in enterprise AI implementation, as it accelerates efforts to expand adoption of ChatGPT and other OpenAI systems inside large organizations.

According to OpenAI, the new unit will help companies build and deploy AI systems by embedding specialized engineers and deployment teams directly within customer organizations. These teams will work alongside corporate departments to identify operational areas where AI systems can automate workflows, improve productivity, or support decision-making.

The acquisition of Tomoro will immediately add around 150 AI engineers and deployment specialists to the business. Tomoro was established in 2023 through a partnership aligned with OpenAI and has worked with companies including Mattel, Red Bull, Tesco, and Virgin Atlantic.

OpenAI said the deployment venture is structured as a multi-year partnership between OpenAI and 19 investment firms. The initiative is led by TPG, with Advent International, Bain Capital, and Brookfield Asset Management acting as co-lead founding partners.

The launch comes as OpenAI intensifies its enterprise expansion efforts following widespread consumer adoption of ChatGPT. The company has increasingly focused on securing long-term corporate contracts and integrating AI systems into business operations at scale.

OpenAI Moves Beyond Software Licensing

The creation of OpenAI Deployment Company signals a broader shift in how frontier AI firms are approaching enterprise adoption. Instead of only selling access to AI models through APIs or subscriptions, OpenAI is building a dedicated implementation business designed to help customers operationalize AI across complex organizations.

The strategy reflects a growing reality in enterprise AI: deploying advanced models often requires extensive customization, workflow integration, governance planning, and technical support. Many companies lack internal expertise to manage those processes independently.

By embedding engineers directly inside client organizations, OpenAI is adopting an approach closer to enterprise consulting and systems integration firms than conventional software vendors. The model resembles how companies such as Palantir Technologies work with customers to integrate AI and data systems into operational workflows.

Enterprise AI Services Become A Competitive Battleground

The announcement also highlights increasing competition between leading AI companies for enterprise market share. OpenAI’s expansion comes as Anthropic continues gaining traction with its Claude models across corporate customers.

As previously reported, both OpenAI and Anthropic were exploring acquisitions of AI deployment and consulting firms as part of broader enterprise strategies. The sector has become strategically important because large organizations often require ongoing implementation support rather than standalone model access.

The new venture also gives OpenAI a larger operational footprint inside customer environments, potentially strengthening long-term relationships and increasing dependence on its infrastructure and models.

SoftBank Explores $100B AI Data Center Investment in France

SoftBank is reportedly evaluating a multibillion-dollar AI data center project in France following talks with President Emmanuel Macron.

By Olivia Grant Edited by Maria Konash Published:
SoftBank weighs major AI data center investment in France after talks between Masayoshi Son and Macron. Image: Anthony Choren / Unsplash

SoftBank CEO Masayoshi Son has reportedly held discussions with French President Emmanuel Macron regarding a large-scale AI data center initiative in France.

According to the Bloomberg report, the project could involve a multibillion-dollar investment aimed at expanding France’s artificial intelligence infrastructure capabilities. While earlier discussions reportedly referenced potential investments of up to $100 billion, sources indicated the final amount may be lower depending on other capital commitments.

The proposal was initially raised by Macron during a meeting with Son in Tokyo, highlighting France’s efforts to attract major AI infrastructure investments and strengthen Europe’s position in the global AI race. An official announcement could reportedly come within weeks.

The discussions reflect intensifying competition among governments to secure AI infrastructure projects, particularly large-scale data centers that support model training and cloud computing. Europe has increasingly emphasized technological sovereignty and domestic AI capacity as reliance on foreign cloud providers grows.

For SoftBank, the potential investment would further expand its role in global AI infrastructure following a series of large-scale bets on semiconductors, data centers, and AI companies. The company has been actively positioning itself to benefit from rising demand for computing power and next-generation AI systems.

The project also signals how AI infrastructure is becoming a strategic geopolitical priority, with governments directly engaging technology investors to accelerate domestic capabilities and attract long-term capital into the sector.

 The discussions come as SoftBank reportedly prepares a new AI and robotics venture focused on automating data center construction, targeting growing global demand for AI infrastructure and a potential $100 billion IPO.

AI & Machine Learning, Cloud & Infrastructure, News, Startups & Investment

OpenAI Releases Three New Voice Models for AI Agents and Translation

OpenAI has released three new realtime audio models for developers, including GPT-Realtime-2 for conversational AI agents, a live translation system supporting more than 70 languages, and a streaming speech-to-text model.

By Daniel Mercer Edited by Maria Konash Published:
OpenAI launches GPT-Realtime-2, live translation, and streaming speech-to-text models for AI voice apps. Image: OpenAI

OpenAI has introduced three new realtime audio models through its API platform, expanding its push into conversational AI agents and voice-based software interfaces. The release includes GPT-Realtime-2, a new voice model with GPT-5-level reasoning capabilities, GPT-Realtime-Translate for live multilingual speech translation, and GPT-Realtime-Whisper for low-latency streaming transcription.

The company said the models are designed to support a new generation of voice applications capable of reasoning through requests, using external tools during conversations, translating speech live, and handling continuous spoken interaction in real time.

GPT-Realtime-2 is positioned as OpenAI’s most advanced voice interaction model so far. The system supports live conversational workflows where the AI can process interruptions, maintain long context windows, call tools in parallel, and continue conversations naturally while tasks are being completed in the background.

OpenAI expanded the model’s context window from 32,000 to 128,000 tokens and introduced adjustable reasoning levels ranging from minimal to “xhigh,” allowing developers to balance latency against reasoning depth. The company said GPT-Realtime-2 scored 96.6% on the Big Bench Audio Intelligence benchmark, compared with 81.4% for GPT-Realtime-1.5.

The company also introduced GPT-Realtime-Translate, a live speech translation model supporting more than 70 input languages and 13 output languages. OpenAI said the model is designed for customer support, international business communication, events, education, and multilingual voice interfaces where conversations need to continue naturally across languages without noticeable delays.

GPT-Realtime-Whisper, meanwhile, focuses on streaming speech recognition. The model transcribes spoken audio as conversations happen, allowing developers to build live captioning systems, meeting assistants, support tools, and voice-driven enterprise workflows with lower latency.

OpenAI said companies including Zillow, Intercom, Priceline, Deutsche Telekom, and Vimeo have already tested the new models in production-oriented voice systems.

“What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions,” said Zillow SVP and Head of AI Josh Weisberg, who said the model improved call success rates during adversarial testing.

OpenAI Pushes Beyond Text-Based Interfaces

The release reflects OpenAI’s broader strategy of moving AI interaction away from chat windows and toward continuous voice-based systems integrated directly into software products and workflows.

Rather than functioning as simple speech interfaces layered on top of chatbots, the new models are designed to operate as realtime agents capable of reasoning, retrieving information, executing actions, and maintaining conversational continuity simultaneously.

OpenAI described three emerging categories for voice AI systems:

  • voice-to-action workflows where agents complete tasks directly from spoken instructions,
  • systems-to-voice interfaces where software proactively communicates updates through speech,
  • voice-to-voice interactions involving live multilingual translation between users.

The company highlighted examples such as AI travel assistants capable of managing itinerary changes conversationally and multilingual customer service systems that translate discussions in real time while preserving natural speech flow.

OpenAI also emphasized production safeguards around the Realtime API, including active classifiers that can interrupt sessions violating safety policies and support for additional developer-defined guardrails through the Agents SDK.

The models are available immediately through OpenAI’s Realtime API. GPT-Realtime-2 is priced at $32 per million audio input tokens and $64 per million output tokens, while GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper costs $0.017 per minute.

AI & Machine Learning, News
Exit mobile version