Cliff Walking
Sutton & Barto's cliff-walking gridworld — three policies, one dangerous edge.
A textbook reinforcement-learning gridworld. The board is a 4×12 grid of cells wired together by north/south/east/west references, with a cliff along the bottom edge and a goal in the corner. Walkers move under a chosen policy — Safe, Cautious or Risky — encoded as per-policy direction probabilities. Stepping off the cliff is a large negative reward that resets the walker; every ordinary step costs one.
A compact benchmark for policy comparison: the same world run under three risk appetites produces three distinct reward profiles you can read straight from the exported walker rows. Demonstrates indirect, reference-based grid navigation and reproducible per-step reward accounting.
Linked tables with guaranteed referential integrity.
Generated REST endpoints. Also exposed as MCP tools.
OSI-compatible definition, emitted with the dataset.
# cliff-walking.osi.yaml — emitted automatically semantic_model: name: "cliff-walking" source: "duckdb://cliff-walking.db" entities: - name: walker primary_key: id dimensions: - name: state type: categorical - name: t type: time measures: - name: row_count agg: count - name: active agg: sum filter: "state = 'ACTIVE'"
More worlds.
Game of Life
Conway's automaton as a perfectly observable, deterministic grid world.
London Underground
A live tube graph — eleven lines, hundreds of trains, platforms held as a mutex.
Pac-Man
A self-playing arcade game — ghosts chase a flood-filled distance field.