Replay — Game Environments for LLM RL

REPLAY

A library of intelligent behavior.

A curated collection of interactive game environments. Explore how models learn, adapt, and discover novel strategies through play.

Arcade

Flappy Bird

LLM-friendly version of the viral mobile hit with text-scaffolded physics. Moves are turn-based—agent receives current position, velocity, and upcoming pipe gaps, then decides TAP or do nothing. Physics constants are explicit so the model can project trajectories ahead.

Endless

Snake

Classic Snake on an 8×8 grid, but the agent plans full paths to each apple rather than single moves. The challenge: reasoning about when body segments will clear before the head arrives. An HP system discourages guessing—reliable pathfinding beats trial-and-error.

Spatial

Pacman

LLM-friendly Pacman on a 10×10 grid. Agent sees the full board—walls, dots, power pellets, and ghost positions—then picks a direction each turn. Tests pursuit-avoidance under pressure, with power-up timing adding strategic depth.

Maze Action

Gridworld

Lava Gap

A 7×7 grid with deadly lava blocking the path to the goal—except for one safe gap. Agent outputs sequences of rotate/forward commands (up to 20/turn) using coordinate-based observations. One wrong step into lava and it's game over with zero reward.

Exploration

Obstructed Maze

MiniGrid maze where obstacles block the obvious paths. Agent navigates by rotating and moving forward, receiving its position, facing direction, and nearby objects each turn. Forces actual pathfinding—no shortcuts, just efficient search and backtracking.

Pathfinding

Locked Room

Key-door puzzles in MiniGrid. Find the right colored key, unlock the matching door, reach the goal. Catch: you can only hold one item at a time, so planning the pickup/drop sequence matters as much as navigation.

PuzzleSequential

Toy Text

Blackjack

The classic card game, text-scaffolded for LLMs. Agent sees its hand total and the dealer's face-up card, then chooses HIT or STICK. Dealer plays by fixed rules (hits on 16 or below). Pure probabilistic reasoning—when to push your luck vs. stand pat.

Probabilistic

Cliff Walking

Navigate a 4×12 grid from start to goal, but the entire bottom edge is a cliff. Fall off and you reset to the beginning. The greedy path hugs the cliff—one slip and you're done. Tests whether models learn caution over speed.

Risk Awareness

Frozen Lake

Grid navigation where the ice is slippery. Pick a direction, but you might slide perpendicular instead. Agent sees the full grid—start, goal, holes, and frozen tiles—but can't trust deterministic plans. Tests reasoning under genuine uncertainty.

Partial Information

Taxi

A 5×5 city grid with walls and four pickup/dropoff spots. Agent drives to the passenger, picks them up, navigates around walls, and drops them at the destination. Multi-step logistics where planning the full route matters.

Route Optimization

Boardgame

Catan

1v1 Settlers of Catan, first to 7 Victory Points wins. Agent manages resources, trades, builds settlements and roads. Long-horizon strategy where early positioning shapes late-game options.

Long Horizon

Locked

Among Us

Live social-deduction arena stressing deception detection, theory-of-mind reasoning, coalition building, and natural-language negotiation.

Social ReasoningMulti-Agent

BridgeNet

Physics-grounded construction sim where agents design load-bearing bridges while learning from human constraints, failures, and rationales.

PhysicsHuman Data

Baba Is You

Rule-rewriting puzzle world that teaches symbolic manipulation and out-of-distribution reasoning by letting players edit the game’s laws.

StrategyHuman Data

Mini Metro

Dynamic transit-planning sandbox about shaping and revising networks under growing demand, congestion, and tight budgets.

StrategyHuman Data

Plants vs Zombies

Lane-based tower-defense arena centered on tempo management, resource allocation, and adaptive counter-building against evolving waves.

StrategyHuman Data

Sokoban

Box-pushing puzzle world that trains irreversible-move planning, deadlock avoidance, and search over tight spatial constraints.

PlanningHuman Data

Design Your Own

Custom human-in-the-loop simulations co-designed for your domain and instrumented to capture rich behavior and reasoning data.

Human DataCustom