Dyna Maze

class rlforge.environments.dyna_maze.DynaMaze(render_mode=None)

Grid-world environment for testing planning-based reinforcement learning agents.

The Dyna Maze is a 6x9 grid with obstacles and a fixed start state. The agent must navigate from the start position to the terminal goal located at the top-right corner of the grid. Obstacles block certain paths, forcing the agent to explore and plan effectively.

Features

  • Discrete state space: each cell in the grid corresponds to a unique state.

  • Discrete action space: four possible moves (UP, RIGHT, DOWN, LEFT).

  • Obstacles: specific grid cells are blocked and cannot be entered.

  • Terminal state: reaching cell (0, 8) ends the episode with reward 1.

  • All other transitions yield reward 0.

  • Compatible with Gymnasium API.

Notes

  • Transition probabilities are deterministic (always 1.0).

  • The environment is designed to illustrate the benefits of planning algorithms such as Dyna-Q.

reset(*, seed: int | None = None, options: dict | None = None)

Reset the environment to its initial state.

Parameters

seedint, optional

Random seed for reproducibility.

optionsdict, optional

Additional options (unused).

Returns

observationint

The starting state index.

infodict

Additional information, including probability of starting state.

step(a)

Execute one step in the environment.

Parameters

aint

Action index (0=UP, 1=RIGHT, 2=DOWN, 3=LEFT).

Returns

observationint

The new state index.

rewardfloat

Reward obtained from the transition.

terminatedbool

Whether the episode has ended.

truncatedbool

Always False (no time limit).

infodict

Additional information, including transition probability.