Nvidia has introduced Kimodo, a new artificial intelligence model designed to generate high-quality 3D motion for humans and robots using text prompts and kinematic constraints. The system represents a step forward in motion synthesis, an area increasingly important for robotics, simulation, and digital content creation.
The model, trained on approximately 700 hours of optical motion capture data, reflects a broader push to scale training datasets in order to improve realism and control. Publicly available motion capture datasets have historically been limited in size, constraining the performance of earlier generative models.
Kimodo builds on this by enabling motion generation directly from natural language descriptions. Users can input prompts to create animations of human movement, reducing the need for manual animation or motion capture sessions. The system can also interpret how robotic structures move, including platforms such as the Unitree G1 humanoid robot, allowing developers to generate motion instructions for machines without relying on human operators.
Flexible Control Through Text and Constraints
In addition to text prompts, Kimodo supports a wide range of kinematic constraints. These include full-body keyframes, joint-level positioning and rotation, as well as two-dimensional waypoints and motion paths.
This flexibility allows developers to guide motion generation at different levels of detail, from general behavioral descriptions to precise physical positioning. The model’s architecture incorporates a two-stage denoising process, separating root motion from body movement, which helps reduce artifacts and improve consistency.
The system’s motion representation is designed to handle diverse input types, enabling it to adapt across use cases in both digital and physical environments. Nvidia said its experiments show that scaling both dataset size and model complexity leads to measurable improvements in motion quality and control accuracy.
Applications Across Robotics and Media
High-quality motion generation has applications across robotics, gaming, film production, and simulation. In robotics, it can accelerate training and deployment by providing synthetic motion data and control instructions. In media, it can streamline animation workflows and reduce production costs.
Kimodo’s ability to generate both human-like motion and robot-specific movement highlights the convergence between AI-driven simulation and real-world automation. By bridging these domains, the model could support more advanced human-robot interaction and autonomous systems.
Nvidia has made a demo of Kimodo available through a public interface, though access may be limited due to demand. The release underscores the company’s continued investment in applying generative AI to physical systems, extending beyond text and images into movement and control.
