pendulum

class rlforge.environments.pendulum.Pendulum(continuous=False, g=9.81, m=0.3333333333333333, l=1.5, dt=0.05)

Simplified pendulum environment for reinforcement learning experiments.

The Pendulum environment simulates the dynamics of a pendulum with controllable torque. The agent’s objective is to keep the pendulum upright (theta close to 0). The environment supports both discrete and continuous action spaces:

  • Discrete mode: three possible torques (-1, 0, +1).

  • Continuous mode: torque values in the range [-2, 2].

Features

  • State space: two-dimensional vector [theta, theta_dot] where theta is the pendulum angle (wrapped to [-π, π]) and theta_dot is the angular velocity.

  • Action space: discrete or continuous torques depending on initialization.

  • Reward: negative absolute angle (-abs(theta)), encouraging the agent to keep the pendulum upright.

  • Deterministic dynamics with simple Euler integration.

Notes

  • The pendulum resets to the downward position (theta = -π, theta_dot = 0).

  • Episodes do not terminate naturally (terminated is always False).

  • This environment is lighter and simpler than the standard Gymnasium Pendulum-v1.

reset()

Reset the pendulum to its initial state.

Returns

observationtuple

A 5-element tuple (state, reward, terminated, truncated, info): - state (numpy.ndarray): initial state [-π, 0]. - reward (float): always 0 at reset. - terminated (bool): always False. - truncated (bool): always False. - info (dict or None): unused, set to None.

step(action)

Advance the pendulum dynamics by one time step.

Parameters

actionint or float
  • If discrete mode: index of the action (0, 1, 2) corresponding to torques -1, 0, +1.

  • If continuous mode: torque value in range [-2, 2].

Returns

observationtuple

A 5-element tuple (state, reward, terminated, truncated, info): - state (numpy.ndarray): [theta, theta_dot] after the step. - reward (float): negative absolute angle. - terminated (bool): always False (no terminal state). - truncated (bool): always False (no time limit). - info (dict or None): unused, set to None.

Notes

  • The angle theta is wrapped to [-π, π].

  • If angular velocity exceeds the allowed range, the pendulum resets to the downward position.