Base Agent

class rlforge.agents.base_agent.BaseAgent

Abstract base class for all RLForge agents.

This class defines the standard interface that every agent in RLForge must implement. It provides a consistent structure for interacting with environments, handling episodes, and selecting actions. By inheriting from BaseAgent, new agents can be integrated seamlessly into the RLForge framework.

Notes

All methods are abstract and must be implemented by subclasses.
The interface is designed to be environment-agnostic, so agents can be applied to both discrete and continuous tasks.

abstract end(reward)

Complete an episode.

This method is called when the environment signals that the episode has terminated. The agent can use the final reward to update its estimates.

Parameters

rewardfloat: The final reward received at the end of the episode.

abstract reset()

Reset the agent’s internal state.

This method is called between episodes to clear any temporary variables or statistics. It ensures that each episode starts from a clean state, without residual information from previous runs.

abstract select_action(state)

Select an action given the current state.

This method encapsulates the agent’s policy. Depending on the implementation, it may be deterministic (e.g., greedy) or stochastic (e.g., epsilon-greedy, softmax, Gaussian).

Parameters

stateobject: The current state observed from the environment.

Returns

actionint or float or numpy.ndarray: The action chosen by the agent.

abstract start(state)

Begin a new episode.

Parameters

stateobject: The initial state observed from the environment.

Returns

actionint or float or numpy.ndarray: The first action selected by the agent given the initial state.

abstract step(reward, state)

Take a step in the environment.

This method is called after the agent receives a reward and the next state from the environment. The agent should update its internal estimates and return the next action.

Parameters

rewardfloat: The reward received from the previous action.
stateobject: The new state observed from the environment.

Returns

actionint or float or numpy.ndarray: The next action chosen by the agent.