Cliff Walking

Sutton & Barto's cliff-walking gridworld — three policies, one dangerous edge.

Async Tick Spatial Observations FSM

Try it live→ Download DuckDB

SC-014 / SCHEMATICReinforcement Learning

Clock 120 Hz tick

Update Asynchronous

Grid 4 × 12

Policies Safe / Cautious / Risky

Observability Per-step

OVERVIEW

A textbook reinforcement-learning gridworld. The board is a 4×12 grid of cells wired together by north/south/east/west references, with a cliff along the bottom edge and a goal in the corner. Walkers move under a chosen policy — Safe, Cautious or Risky — encoded as per-policy direction probabilities. Stepping off the cliff is a large negative reward that resets the walker; every ordinary step costs one.

A compact benchmark for policy comparison: the same world run under three risk appetites produces three distinct reward profiles you can read straight from the exported walker rows. Demonstrates indirect, reference-based grid navigation and reproducible per-step reward accounting.

TRAITS

Async

Independent, event-driven timelines

Tick

Discrete fixed-step time

Spatial

Entities have position & topology

Observations

Emits structured agent observations

FSM

Entities are finite-state machines

SCHEMA

Linked tables with guaranteed referential integrity.

TABLECOLUMNSDESCRIPTION

walker ID, policy, steps, total_reward, current_cell_id, current_state One row per walker: its policy, steps taken, cumulative reward and current cell.

gridcell ID, kind, reward, north_id, south_id, east_id, west_id, current_state One row per cell: kind (normal/cliff/goal), step reward and the four neighbour references.

LIVE API

Generated REST endpoints. Also exposed as MCP tools.

POST /scenarios/cliff-walking/experiments Seed a new gridworld

POST /scenarios/cliff-walking/experiments/{eid}/run Advance N turns, or request a move

GET /scenarios/cliff-walking/experiments/{eid}/entities/walker Read walker positions and rewards

GET /scenarios/cliff-walking/experiments/{eid}/events Append-only event log

GET /scenarios/cliff-walking/experiments/{eid}/dataset Download the exported dataset

SEMANTIC LAYER

OSI-compatible definition, emitted with the dataset.

# cliff-walking.osi.yaml — emitted automatically
semantic_model:
  name: "cliff-walking"
  source: "duckdb://cliff-walking.db"
  entities:
    - name: walker
      primary_key: id
  dimensions:
    - name: state
      type: categorical
    - name: t
      type: time
  measures:
    - name: row_count
      agg: count
    - name: active
      agg: sum
      filter: "state = 'ACTIVE'"