Short Corridor
- class rlforge.environments.short_corridor.ShortCorridor(observation_type: str = 'tabular', render_mode: str | None = None)
The Short Corridor Gridworld (Sutton & Barto Example 13.1).
The environment has 4 states (0, 1, 2, 3), where 3 is the terminal state.
Start state is S=0.
Reward is -1 per step until the terminal state is reached. (Note: The description says “reward is 1 per step, as usual” for the example, but the goal is to minimize steps, which is traditionally achieved by R=-1 per step. We use R=-1 for the standard shortest-path/episodic formulation.)
Actions are LEFT (0) and RIGHT (1).
Transitions:
State 0: LEFT -> 0 (stay), RIGHT -> 1
State 1: LEFT -> 0, RIGHT -> 2 (Reversed!)
State 2: LEFT -> 1, RIGHT -> 3 (Terminal)
The observation returned is controlled by observation_type.
- property actions
Helper property for action indices.
- close()
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections. Calling
closeon an already closed environment has no effect and won’t raise an error.
- render()
Compute the render frames as specified by
render_modeduring the initialization of the environment.The environment’s
metadatarender modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.- Note:
As the
render_modeis known during__init__, the objects used to render the environment state should be initialised in__init__.
By convention, if the
render_modeis:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()andrender()doesn’t need to be called. ReturnsNone.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarraywith shape(x, y, 3)representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str) orStringIO.StringIOcontaining a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollectionthat is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped afterrender()is called orreset().
- Note:
Make sure that your class’s
metadata"render_modes"key includes the list of supported modes.
Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- reset(*, seed: int | None = None, options: dict | None = None) Tuple[int | ndarray, Dict[str, Any]]
Reset the environment to its initial state (s=0).
- step(a: int) Tuple[int | ndarray, float, bool, bool, Dict[str, Any]]
Performs a single step in the environment.
Note: The observation returned by step is based on the action taken to get to the current state, as described in the problem’s feature definition: x(s, right) = [1, 0], x(s, left) = [0, 1], for all s.