ai agents

TAU-bench
By

TAU-bench

By

TAU-bench is a benchmark that tests how well AI agents interact with users and tools in realistic, multi-step scenarios, measuring not just success but reliability across repeated trials.