Experiment Runner

class rlforge.experiments.experiment_runner.ExperimentRunner(env, agent)

A unified class to run reinforcement learning experiments.

This runner supports both episodic and continuous settings across multiple runs and environments. It manages agent resets, environment interactions, trajectory storage, and provides built-in functionality for summarizing and plotting results.

Parameters

envobject

The environment instance (standard or vectorized) following the Gym API.

agentobject

The agent instance implementing the RL interface (start, step, end, reset).

load_results(filepath)

Loads experiment results from a pickle file and assigns them to self.results.

Parameters

filepathstr

Path to the pickle file.

Returns

dict

The loaded results dictionary.

plot_results(metric='reward', window_size=50, max_reward=None)

Plot experiment results with smoothing and error bands.

Generates a learning curve or episode length curve depending on the selected metric. Results are averaged across runs, smoothed using a moving average, and displayed with a shaded error band representing the standard deviation.

Parameters

metricstr, optional

Metric to plot. Options: - “reward” : plots mean total reward across runs. - “step” : plots mean episode length (episodic only). Default is “reward”.

window_sizeint, optional

Window size for moving average smoothing (default=50).

max_rewardfloat, optional

Optional maximum reward reference line to plot (default=None).

Notes

  • Uses NaN-safe mean and standard deviation calculations to handle padded episodic results.

  • Smooths only the mean curve; error bands use raw standard deviation.

  • Supports both episodic and continuous experiment types.

  • Plots include grid, legend, and tight layout for readability.

Returns

None

Displays a matplotlib plot of the selected metric.

run_continuous(num_runs, num_steps)

Run the experiment in a continuous setting.

Executes multiple runs of continuous training for a fixed number of steps, storing rewards and full trajectories.

Parameters

num_runsint

Number of independent runs to execute.

num_stepsint

Number of steps per run.

Returns

dict

Results dictionary containing: - type : str, “continuous” - rewards : np.ndarray, shape (num_steps, num_runs) - trajectories : list of dicts per run - runtime_per_run : list of floats - mean_rewards : np.ndarray, mean reward per step across runs

run_episodic(num_runs, num_episodes, max_steps_per_episode=None)

Run the experiment in an episodic setting.

Executes multiple runs of episodic training, storing rewards, steps per episode, and full trajectories.

Parameters

num_runsint

Number of independent runs to execute.

num_episodesint

Number of episodes per run.

max_steps_per_episodeint, optional

Maximum steps allowed per episode. If None, episodes run until environment termination.

Returns

dict

Results dictionary containing: - type : str, “episodic” - rewards : np.ndarray, shape (num_episodes, num_runs) - steps : np.ndarray, shape (num_episodes, num_runs) - mean_rewards : np.ndarray, mean reward per episode across runs - std_rewards : np.ndarray, std dev of reward per episode across runs - mean_steps : np.ndarray, mean steps per episode across runs - std_steps : np.ndarray, std dev of steps per episode across runs - runtime_per_run : np.ndarray, duration of each run in seconds - total_runtime : float, total duration of the experiment

run_episodic_batch(num_runs, num_episodes, max_steps_per_episode=None)

Run the experiment in an episodic setting using a vectorized environment.

Executes multiple runs of episodic training with parallel environments, storing rewards, steps per episode, and full trajectories. This version correctly calls the agent’s end_batch method both per environment upon episode termination and once at the end of the run for on-policy agents (e.g., PPO).

Parameters

num_runsint

Number of independent runs to execute.

num_episodesint

Number of episodes per run.

max_steps_per_episodeint, optional

Maximum steps allowed per episode. If None, episodes run until environment termination.

Returns

dict

Results dictionary containing: - type : str, “episodic” - rewards : np.ndarray, shape (max_episodes, num_runs) - steps : np.ndarray, shape (max_episodes, num_runs) - runtime_per_run : list of floats - total_runtime : float - mean_rewards : np.ndarray, mean reward per episode across runs - std_rewards : np.ndarray, std reward per episode across runs - mean_steps : np.ndarray, mean steps per episode across runs - std_steps : np.ndarray, std steps per episode across runs

Notes

  • Supports vectorized environments with multiple parallel episodes.

  • Handles per-environment termination and resets trackers correctly.

save_results(filepath)

Saves the experiment results dictionary to a file using pickle.

Parameters

filepathstr

Path to the file where results should be saved (e.g., ‘results.pkl’).

summary(last_n=10)

Print a summary of experiment results.

Displays key statistics for episodic or continuous experiments, including mean rewards, steps, and runtime information.

Parameters

last_nint, optional

Number of episodes or steps to include in the “last N” summary (default=10).

Notes

  • Episodic summary includes first, last, overall, and last-N mean rewards and steps.

  • Continuous summary includes first, last, overall, and last-N mean rewards.

  • Uses NaN-safe operations to handle padded episodic results.

  • Prints results directly to stdout.