Experiment Runner

class rlforge.experiments.experiment_runner.ExperimentRunner(env, agent)

A unified class to run reinforcement learning experiments.

This runner supports both episodic and continuous settings across multiple runs and environments. It manages agent resets, environment interactions, trajectory storage, and provides built-in functionality for summarizing and plotting results.

Parameters

envobject: The environment instance (standard or vectorized) following the Gym API.
agentobject: The agent instance implementing the RL interface (start, step, end, reset).

load_results(filepath)

Loads experiment results from a pickle file and assigns them to self.results.

Parameters

filepathstr: Path to the pickle file.

Returns

dict: The loaded results dictionary.

plot_results(metric='reward', window_size=50, max_reward=None)

Plot experiment results with smoothing and error bands.

Generates a learning curve or episode length curve depending on the selected metric. Results are averaged across runs, smoothed using a moving average, and displayed with a shaded error band representing the standard deviation.

Parameters

metricstr, optional: Metric to plot. Options: - “reward” : plots mean total reward across runs. - “step” : plots mean episode length (episodic only). Default is “reward”.
window_sizeint, optional: Window size for moving average smoothing (default=50).
max_rewardfloat, optional: Optional maximum reward reference line to plot (default=None).

Notes

Uses NaN-safe mean and standard deviation calculations to handle padded episodic results.
Smooths only the mean curve; error bands use raw standard deviation.
Supports both episodic and continuous experiment types.
Plots include grid, legend, and tight layout for readability.

Returns

None: Displays a matplotlib plot of the selected metric.

run_continuous(num_runs, num_steps)

Run the experiment in a continuous setting.

Executes multiple runs of continuous training for a fixed number of steps, storing rewards and full trajectories.

Parameters

num_runsint: Number of independent runs to execute.
num_stepsint: Number of steps per run.

Returns

dict: Results dictionary containing: - type : str, “continuous” - rewards : np.ndarray, shape (num_steps, num_runs) - trajectories : list of dicts per run - runtime_per_run : list of floats - mean_rewards : np.ndarray, mean reward per step across runs

run_episodic(num_runs, num_episodes, max_steps_per_episode=None)

Run the experiment in an episodic setting.

Executes multiple runs of episodic training, storing rewards, steps per episode, and full trajectories.

Parameters

num_runsint: Number of independent runs to execute.
num_episodesint: Number of episodes per run.
max_steps_per_episodeint, optional: Maximum steps allowed per episode. If None, episodes run until environment termination.

Returns

dict: Results dictionary containing: - type : str, “episodic” - rewards : np.ndarray, shape (num_episodes, num_runs) - steps : np.ndarray, shape (num_episodes, num_runs) - mean_rewards : np.ndarray, mean reward per episode across runs - std_rewards : np.ndarray, std dev of reward per episode across runs - mean_steps : np.ndarray, mean steps per episode across runs - std_steps : np.ndarray, std dev of steps per episode across runs - runtime_per_run : np.ndarray, duration of each run in seconds - total_runtime : float, total duration of the experiment

run_episodic_batch(num_runs, num_episodes, max_steps_per_episode=None)

Run the experiment in an episodic setting using a vectorized environment.

Executes multiple runs of episodic training with parallel environments, storing rewards, steps per episode, and full trajectories. This version correctly calls the agent’s end_batch method both per environment upon episode termination and once at the end of the run for on-policy agents (e.g., PPO).

Parameters

num_runsint: Number of independent runs to execute.
num_episodesint: Number of episodes per run.
max_steps_per_episodeint, optional: Maximum steps allowed per episode. If None, episodes run until environment termination.

Returns

dict: Results dictionary containing: - type : str, “episodic” - rewards : np.ndarray, shape (max_episodes, num_runs) - steps : np.ndarray, shape (max_episodes, num_runs) - runtime_per_run : list of floats - total_runtime : float - mean_rewards : np.ndarray, mean reward per episode across runs - std_rewards : np.ndarray, std reward per episode across runs - mean_steps : np.ndarray, mean steps per episode across runs - std_steps : np.ndarray, std steps per episode across runs

Notes

Supports vectorized environments with multiple parallel episodes.
Handles per-environment termination and resets trackers correctly.

save_results(filepath)

Saves the experiment results dictionary to a file using pickle.

Parameters

filepathstr: Path to the file where results should be saved (e.g., ‘results.pkl’).

summary(last_n=10)

Print a summary of experiment results.

Displays key statistics for episodic or continuous experiments, including mean rewards, steps, and runtime information.

Parameters

last_nint, optional: Number of episodes or steps to include in the “last N” summary (default=10).

Notes

Episodic summary includes first, last, overall, and last-N mean rewards and steps.
Continuous summary includes first, last, overall, and last-N mean rewards.
Uses NaN-safe operations to handle padded episodic results.
Prints results directly to stdout.