A library of intelligent behavior.
A curated collection of interactive game environments. Explore how models learn, adapt, and discover novel strategies through play.
Arcade

Flappy Bird
LLM-friendly version of the viral mobile hit with text-scaffolded physics. Moves are turn-based—agent receives current position, velocity, and upcoming pipe gaps, then decides TAP or do nothing. Physics constants are explicit so the model can project trajectories ahead.

Snake
Classic Snake on an 8×8 grid, but the agent plans full paths to each apple rather than single moves. The challenge: reasoning about when body segments will clear before the head arrives. An HP system discourages guessing—reliable pathfinding beats trial-and-error.

Pacman
LLM-friendly Pacman on a 10×10 grid. Agent sees the full board—walls, dots, power pellets, and ghost positions—then picks a direction each turn. Tests pursuit-avoidance under pressure, with power-up timing adding strategic depth.
Gridworld

Lava Gap
A 7×7 grid with deadly lava blocking the path to the goal—except for one safe gap. Agent outputs sequences of rotate/forward commands (up to 20/turn) using coordinate-based observations. One wrong step into lava and it's game over with zero reward.

Obstructed Maze
MiniGrid maze where obstacles block the obvious paths. Agent navigates by rotating and moving forward, receiving its position, facing direction, and nearby objects each turn. Forces actual pathfinding—no shortcuts, just efficient search and backtracking.

Locked Room
Key-door puzzles in MiniGrid. Find the right colored key, unlock the matching door, reach the goal. Catch: you can only hold one item at a time, so planning the pickup/drop sequence matters as much as navigation.
Toy Text

Blackjack
The classic card game, text-scaffolded for LLMs. Agent sees its hand total and the dealer's face-up card, then chooses HIT or STICK. Dealer plays by fixed rules (hits on 16 or below). Pure probabilistic reasoning—when to push your luck vs. stand pat.

Cliff Walking
Navigate a 4×12 grid from start to goal, but the entire bottom edge is a cliff. Fall off and you reset to the beginning. The greedy path hugs the cliff—one slip and you're done. Tests whether models learn caution over speed.

Frozen Lake
Grid navigation where the ice is slippery. Pick a direction, but you might slide perpendicular instead. Agent sees the full grid—start, goal, holes, and frozen tiles—but can't trust deterministic plans. Tests reasoning under genuine uncertainty.

Taxi
A 5×5 city grid with walls and four pickup/dropoff spots. Agent drives to the passenger, picks them up, navigates around walls, and drops them at the destination. Multi-step logistics where planning the full route matters.
Boardgame
Locked

Among Us
Live social-deduction arena stressing deception detection, theory-of-mind reasoning, coalition building, and natural-language negotiation.

BridgeNet
Physics-grounded construction sim where agents design load-bearing bridges while learning from human constraints, failures, and rationales.

Baba Is You
Rule-rewriting puzzle world that teaches symbolic manipulation and out-of-distribution reasoning by letting players edit the game’s laws.

Mini Metro
Dynamic transit-planning sandbox about shaping and revising networks under growing demand, congestion, and tight budgets.

Plants vs Zombies
Lane-based tower-defense arena centered on tempo management, resource allocation, and adaptive counter-building against evolving waves.

Sokoban
Box-pushing puzzle world that trains irreversible-move planning, deadlock avoidance, and search over tight spatial constraints.

Design Your Own
Custom human-in-the-loop simulations co-designed for your domain and instrumented to capture rich behavior and reasoning data.
