Dyna Maze

class rlforge.environments.dyna_maze.DynaMaze(render_mode=None)

Grid-world environment for testing planning-based reinforcement learning agents.

The Dyna Maze is a 6x9 grid with obstacles and a fixed start state. The agent must navigate from the start position to the terminal goal located at the top-right corner of the grid. Obstacles block certain paths, forcing the agent to explore and plan effectively.

Features

Discrete state space: each cell in the grid corresponds to a unique state.
Discrete action space: four possible moves (UP, RIGHT, DOWN, LEFT).
Obstacles: specific grid cells are blocked and cannot be entered.
Terminal state: reaching cell (0, 8) ends the episode with reward 1.
All other transitions yield reward 0.
Compatible with Gymnasium API.

Notes

Transition probabilities are deterministic (always 1.0).
The environment is designed to illustrate the benefits of planning algorithms such as Dyna-Q.

reset(*, seed: int | None = None, options: dict | None = None)

Reset the environment to its initial state.

Parameters

seedint, optional: Random seed for reproducibility.
optionsdict, optional: Additional options (unused).

Returns

observationint: The starting state index.
infodict: Additional information, including probability of starting state.

step(a)

Execute one step in the environment.

Parameters

aint: Action index (0=UP, 1=RIGHT, 2=DOWN, 3=LEFT).

Returns

observationint: The new state index.
rewardfloat: Reward obtained from the transition.
terminatedbool: Whether the episode has ended.
truncatedbool: Always False (no time limit).
infodict: Additional information, including transition probability.