Short Corridor

class rlforge.environments.short_corridor.ShortCorridor(observation_type: str = 'tabular', render_mode: str | None = None)

The Short Corridor Gridworld (Sutton & Barto Example 13.1).

The environment has 4 states (0, 1, 2, 3), where 3 is the terminal state.

  • Start state is S=0.

  • Reward is -1 per step until the terminal state is reached. (Note: The description says “reward is 1 per step, as usual” for the example, but the goal is to minimize steps, which is traditionally achieved by R=-1 per step. We use R=-1 for the standard shortest-path/episodic formulation.)

  • Actions are LEFT (0) and RIGHT (1).

  • Transitions:

    • State 0: LEFT -> 0 (stay), RIGHT -> 1

    • State 1: LEFT -> 0, RIGHT -> 2 (Reversed!)

    • State 2: LEFT -> 1, RIGHT -> 3 (Terminal)

The observation returned is controlled by observation_type.

property actions

Helper property for action indices.

close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

render()

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • “rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • “ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • “rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(*, seed: int | None = None, options: dict | None = None) Tuple[int | ndarray, Dict[str, Any]]

Reset the environment to its initial state (s=0).

step(a: int) Tuple[int | ndarray, float, bool, bool, Dict[str, Any]]

Performs a single step in the environment.

Note: The observation returned by step is based on the action taken to get to the current state, as described in the problem’s feature definition: x(s, right) = [1, 0], x(s, left) = [0, 1], for all s.