pendulum
- class rlforge.environments.pendulum.Pendulum(continuous=False, g=9.81, m=0.3333333333333333, l=1.5, dt=0.05)
Simplified pendulum environment for reinforcement learning experiments.
The Pendulum environment simulates the dynamics of a pendulum with controllable torque. The agent’s objective is to keep the pendulum upright (theta close to 0). The environment supports both discrete and continuous action spaces:
Discrete mode: three possible torques (-1, 0, +1).
Continuous mode: torque values in the range [-2, 2].
Features
State space: two-dimensional vector
[theta, theta_dot]wherethetais the pendulum angle (wrapped to [-π, π]) andtheta_dotis the angular velocity.Action space: discrete or continuous torques depending on initialization.
Reward: negative absolute angle (
-abs(theta)), encouraging the agent to keep the pendulum upright.Deterministic dynamics with simple Euler integration.
Notes
The pendulum resets to the downward position (theta = -π, theta_dot = 0).
Episodes do not terminate naturally (
terminatedis always False).This environment is lighter and simpler than the standard Gymnasium
Pendulum-v1.
- reset()
Reset the pendulum to its initial state.
Returns
- observationtuple
A 5-element tuple
(state, reward, terminated, truncated, info): - state (numpy.ndarray): initial state [-π, 0]. - reward (float): always 0 at reset. - terminated (bool): always False. - truncated (bool): always False. - info (dict or None): unused, set to None.
- step(action)
Advance the pendulum dynamics by one time step.
Parameters
- actionint or float
If discrete mode: index of the action (0, 1, 2) corresponding to torques -1, 0, +1.
If continuous mode: torque value in range [-2, 2].
Returns
- observationtuple
A 5-element tuple
(state, reward, terminated, truncated, info): - state (numpy.ndarray): [theta, theta_dot] after the step. - reward (float): negative absolute angle. - terminated (bool): always False (no terminal state). - truncated (bool): always False (no time limit). - info (dict or None): unused, set to None.
Notes
The angle
thetais wrapped to [-π, π].If angular velocity exceeds the allowed range, the pendulum resets to the downward position.