Experiment Runner
- class rlforge.experiments.experiment_runner.ExperimentRunner(env, agent)
A unified class to run reinforcement learning experiments.
This runner supports both episodic and continuous settings across multiple runs and environments. It manages agent resets, environment interactions, trajectory storage, and provides built-in functionality for summarizing and plotting results.
Parameters
- envobject
The environment instance (standard or vectorized) following the Gym API.
- agentobject
The agent instance implementing the RL interface (start, step, end, reset).
- load_results(filepath)
Loads experiment results from a pickle file and assigns them to self.results.
Parameters
- filepathstr
Path to the pickle file.
Returns
- dict
The loaded results dictionary.
- plot_results(metric='reward', window_size=50, max_reward=None)
Plot experiment results with smoothing and error bands.
Generates a learning curve or episode length curve depending on the selected metric. Results are averaged across runs, smoothed using a moving average, and displayed with a shaded error band representing the standard deviation.
Parameters
- metricstr, optional
Metric to plot. Options: - “reward” : plots mean total reward across runs. - “step” : plots mean episode length (episodic only). Default is “reward”.
- window_sizeint, optional
Window size for moving average smoothing (default=50).
- max_rewardfloat, optional
Optional maximum reward reference line to plot (default=None).
Notes
Uses NaN-safe mean and standard deviation calculations to handle padded episodic results.
Smooths only the mean curve; error bands use raw standard deviation.
Supports both episodic and continuous experiment types.
Plots include grid, legend, and tight layout for readability.
Returns
- None
Displays a matplotlib plot of the selected metric.
- run_continuous(num_runs, num_steps)
Run the experiment in a continuous setting.
Executes multiple runs of continuous training for a fixed number of steps, storing rewards and full trajectories.
Parameters
- num_runsint
Number of independent runs to execute.
- num_stepsint
Number of steps per run.
Returns
- dict
Results dictionary containing: - type : str, “continuous” - rewards : np.ndarray, shape (num_steps, num_runs) - trajectories : list of dicts per run - runtime_per_run : list of floats - mean_rewards : np.ndarray, mean reward per step across runs
- run_episodic(num_runs, num_episodes, max_steps_per_episode=None)
Run the experiment in an episodic setting.
Executes multiple runs of episodic training, storing rewards, steps per episode, and full trajectories.
Parameters
- num_runsint
Number of independent runs to execute.
- num_episodesint
Number of episodes per run.
- max_steps_per_episodeint, optional
Maximum steps allowed per episode. If None, episodes run until environment termination.
Returns
- dict
Results dictionary containing: - type : str, “episodic” - rewards : np.ndarray, shape (num_episodes, num_runs) - steps : np.ndarray, shape (num_episodes, num_runs) - mean_rewards : np.ndarray, mean reward per episode across runs - std_rewards : np.ndarray, std dev of reward per episode across runs - mean_steps : np.ndarray, mean steps per episode across runs - std_steps : np.ndarray, std dev of steps per episode across runs - runtime_per_run : np.ndarray, duration of each run in seconds - total_runtime : float, total duration of the experiment
- run_episodic_batch(num_runs, num_episodes, max_steps_per_episode=None)
Run the experiment in an episodic setting using a vectorized environment.
Executes multiple runs of episodic training with parallel environments, storing rewards, steps per episode, and full trajectories. This version correctly calls the agent’s
end_batchmethod both per environment upon episode termination and once at the end of the run for on-policy agents (e.g., PPO).Parameters
- num_runsint
Number of independent runs to execute.
- num_episodesint
Number of episodes per run.
- max_steps_per_episodeint, optional
Maximum steps allowed per episode. If None, episodes run until environment termination.
Returns
- dict
Results dictionary containing: - type : str, “episodic” - rewards : np.ndarray, shape (max_episodes, num_runs) - steps : np.ndarray, shape (max_episodes, num_runs) - runtime_per_run : list of floats - total_runtime : float - mean_rewards : np.ndarray, mean reward per episode across runs - std_rewards : np.ndarray, std reward per episode across runs - mean_steps : np.ndarray, mean steps per episode across runs - std_steps : np.ndarray, std steps per episode across runs
Notes
Supports vectorized environments with multiple parallel episodes.
Handles per-environment termination and resets trackers correctly.
- save_results(filepath)
Saves the experiment results dictionary to a file using pickle.
Parameters
- filepathstr
Path to the file where results should be saved (e.g., ‘results.pkl’).
- summary(last_n=10)
Print a summary of experiment results.
Displays key statistics for episodic or continuous experiments, including mean rewards, steps, and runtime information.
Parameters
- last_nint, optional
Number of episodes or steps to include in the “last N” summary (default=10).
Notes
Episodic summary includes first, last, overall, and last-N mean rewards and steps.
Continuous summary includes first, last, overall, and last-N mean rewards.
Uses NaN-safe operations to handle padded episodic results.
Prints results directly to stdout.