AI Agents Can Now Hire Humans to Finish Tasks They Cannot
A new platform called Rent a Human allows AI agents to outsource tasks to real people when automation falls short, highlighting an unusual hybrid model of human-in-the-loop labor.
TAU-bench (Tool-Agent-User benchmark) is an evaluation framework designed to test how effectively AI agents can interact with users and external tools in realistic, multi-turn scenarios. Developed by researchers from Anthropic, it measures an agent’s ability to complete structured tasks—like booking flights or managing online orders—while following rules, handling APIs, and reasoning across multiple steps. A key innovation in TAU-bench is its pass@k metric, which evaluates not just whether an agent succeeds once, but how consistently it can repeat correct behavior across several trials. The benchmark highlights that even advanced AI models often struggle with reliability, consistency, and proper tool use over extended interactions. TAU-bench has quickly become a standard for assessing the real-world robustness of autonomous and conversational AI systems.
A new platform called Rent a Human allows AI agents to outsource tasks to real people when automation falls short, highlighting an unusual hybrid model of human-in-the-loop labor.
OpenAI has introduced the Codex app for macOS, a new desktop interface designed to help developers manage multiple AI agents in parallel and supervise long-running software projects more effectively.
OpenAI has unveiled ChatGPT Atlas, a new web browser with ChatGPT integrated directly into its interface, blending AI assistance, search, and memory into everyday browsing.
Oracle has introduced an AI Agent Marketplace within its Fusion Applications suite, expanding AI Agent Studio and integrating top LLMs from OpenAI, Anthropic, and others to accelerate enterprise AI transformation.
Google has unveiled Gemini Enterprise, a next-generation AI platform designed for business productivity and collaboration, positioning itself directly against Microsoft’s Copilot and OpenAI’s enterprise solutions in the race for workplace AI dominance.
Nvidia and Fujitsu have entered into a strategic collaboration to co-develop AI robotics and foundational AI infrastructure in Japan, aiming for deployment by 2030 and a ‘human-centric’ model of innovation.
OpenAI and Databricks have forged a multiyear $100 million partnership to embed powerful AI agents into enterprise data platforms, enabling organizations to build agents using their own data.