Dyna Maze
- class rlforge.environments.dyna_maze.DynaMaze(render_mode=None)
Grid-world environment for testing planning-based reinforcement learning agents.
The Dyna Maze is a 6x9 grid with obstacles and a fixed start state. The agent must navigate from the start position to the terminal goal located at the top-right corner of the grid. Obstacles block certain paths, forcing the agent to explore and plan effectively.
Features
Discrete state space: each cell in the grid corresponds to a unique state.
Discrete action space: four possible moves (UP, RIGHT, DOWN, LEFT).
Obstacles: specific grid cells are blocked and cannot be entered.
Terminal state: reaching cell (0, 8) ends the episode with reward 1.
All other transitions yield reward 0.
Compatible with Gymnasium API.
Notes
Transition probabilities are deterministic (always 1.0).
The environment is designed to illustrate the benefits of planning algorithms such as Dyna-Q.
- reset(*, seed: int | None = None, options: dict | None = None)
Reset the environment to its initial state.
Parameters
- seedint, optional
Random seed for reproducibility.
- optionsdict, optional
Additional options (unused).
Returns
- observationint
The starting state index.
- infodict
Additional information, including probability of starting state.
- step(a)
Execute one step in the environment.
Parameters
- aint
Action index (0=UP, 1=RIGHT, 2=DOWN, 3=LEFT).
Returns
- observationint
The new state index.
- rewardfloat
Reward obtained from the transition.
- terminatedbool
Whether the episode has ended.
- truncatedbool
Always False (no time limit).
- infodict
Additional information, including transition probability.