Obstacle Avoidance
- class rlforge.environments.obstacle_avoidance.ObstacleAvoidance(x_range=(-2, 2), y_range=(-2, 2), initial_state=(0, 0, 0), target=(1, 1, 0.1), obstacles=[(0.5, 0.5, 0.2)], dt=0.01)
Simplified mobile robot environment with obstacle avoidance.
The ObstacleAvoidance environment models a robot navigating in a bounded 2D workspace with static circular obstacles. The agent’s goal is to reach a target position while avoiding collisions and staying within the workspace limits.
Features
State space: five-dimensional vector
[x, y, theta, error, heading_error]. *x, y: robot position in the plane. *theta: robot orientation (wrapped to [-π, π]). *error: Euclidean distance to the target. *heading_error: difference between robot orientation and target heading.Action space: discrete set of velocity commands. * Forward motion. * Rotate counterclockwise. * Rotate clockwise.
Reward: * Negative distance and heading error (encourages approaching the target). * Large penalty if leaving the workspace. * Penalty if colliding with an obstacle. * Zero reward when reaching the target.
Terminal condition: reaching the target within a tolerance radius.
Compatible with Gymnasium API.
Notes
Obstacles are defined as circles with center coordinates and radius.
The environment uses simple kinematic equations with Euler integration.
- reset()
Reset the robot to its initial state.
Returns
- observationtuple
A 5-element tuple
(state, reward, terminated, truncated, info): - state (numpy.ndarray): [x, y, theta, error, heading_error]. - reward (float): negative distance and heading error at reset. - terminated (bool): always False. - truncated (bool): always False. - info (dict or None): unused, set to None.
- step(action)
Advance the robot dynamics by one time step.
Parameters
- actionint
Index of the chosen action: - 0: forward motion - 1: rotate counterclockwise - 2: rotate clockwise
Returns
- observationtuple
A 5-element tuple
(state, reward, terminated, truncated, info): - state (numpy.ndarray): [x, y, theta, error, heading_error]. - reward (float): reward based on distance, heading error, and penalties. - terminated (bool): True if the target is reached. - truncated (bool): always False (no time limit). - info (dict or None): unused, set to None.
Notes
Position is updated using simple kinematics with orientation.
Orientation is wrapped to [-π, π].
Leaving the workspace yields a large penalty and resets to the previous state.
Colliding with an obstacle yields a penalty and resets to the previous state.
Reaching the target within the tolerance radius ends the episode.