Google researchers have introduced a new architecture called Titans and a supporting theoretical framework known as MIRAS. Together, they aim to overcome current scaling limits in AI sequence modeling by allowing models to learn and update long-term memory during inference. The innovation targets scenarios such as full-document understanding, genomic data processing, and time-series forecasting, all of which demand the ability to process and retain information across extremely long sequences.
Transformer-based models transformed the industry with attention mechanisms, yet their computational cost scales quadratically with context length. This makes it difficult to maintain efficiency as input sizes grow into the millions of tokens. Alternative architectures like linear recurrent neural networks and state space models offer fast scaling but often compress past information into small fixed states, limiting accuracy.
Titans combines the strengths of both approaches. It introduces a neural long-term memory module built as a multi-layer perceptron, giving the model a more expressive way to summarize past data. Instead of storing only small mathematical representations of previous tokens, the system synthesizes broader conceptual meaning from earlier content.
Real-Time Memorization Through Surprise-Driven Learning
A core feature of Titans is the use of a “surprise metric” to determine what information should be remembered. When new input differs strongly from the model’s expectations, the system identifies it as significant and stores it in long-term memory. This allows selective retention without slowing computation.
Additional mechanisms support stability and adaptability:
- Momentum incorporates both immediate and recent novelty, ensuring closely related context is learned together.
- Adaptive forgetting removes outdated details to manage memory capacity over extremely long sequences.
These capabilities contribute to what the researchers call test-time memorization, meaning the AI continues learning from active input without offline retraining.
Built on top of Titans, the MIRAS framework provides a unified view of sequence modeling. It categorizes architectures by key components such as memory structure, prioritization method, retention strategy, and update algorithm. MIRAS enables exploration beyond common objectives like mean squared error to improve robustness and reduce sensitivity to outlier inputs.
The research includes three attention-free MIRAS variants: YAAD, MONETA, and MEMORA. Each investigates a different mathematical formulation for memory stability and error handling.
Strong Performance in Long-Context Benchmarks
Google evaluated Titans and MIRAS models against Transformers++, Mamba-2, and other recurrent baselines across language modeling and reasoning tasks. Results showed higher accuracy and lower perplexity at similar parameter scales. Deep memory modules were especially important, demonstrating stronger performance as sequence lengths increased.
The architecture excelled in extreme long-context recall. On the BABILong benchmark, Titans surpassed competing systems including significantly larger models. It also scaled beyond two million tokens while maintaining fast, linear inference.
The work reflects Google’s broader effort to expand AI capabilities alongside infrastructure and consumer-facing features. The company recently said it must double its AI compute supply every six months to keep pace with model demand. Google is also testing a native AI mode in Search that integrates generative responses directly into results pages. These developments highlight how improvements in core architecture, scaling, and product integration are advancing together.