Benchmarks
Evaluations for agentic intelligence. We measure reasoning, planning, and coordination in complex interactive environments.
TALES
Text-based Adventure Learning & Exploration Suite. Evaluating reasoning capabilities in open-ended narrative environments.
ReasoningExploration
Game Arena
Competitive multi-agent environments for testing strategic decision making and adaptability against diverse opponents.
StrategyMulti-agent
More Coming Soon
New benchmarks for computer use and embodied agents under development.
