Base Agent
- class rlforge.agents.base_agent.BaseAgent
Abstract base class for all RLForge agents.
This class defines the standard interface that every agent in RLForge must implement. It provides a consistent structure for interacting with environments, handling episodes, and selecting actions. By inheriting from
BaseAgent, new agents can be integrated seamlessly into the RLForge framework.Notes
All methods are abstract and must be implemented by subclasses.
The interface is designed to be environment-agnostic, so agents can be applied to both discrete and continuous tasks.
- abstract end(reward)
Complete an episode.
This method is called when the environment signals that the episode has terminated. The agent can use the final reward to update its estimates.
Parameters
- rewardfloat
The final reward received at the end of the episode.
- abstract reset()
Reset the agent’s internal state.
This method is called between episodes to clear any temporary variables or statistics. It ensures that each episode starts from a clean state, without residual information from previous runs.
- abstract select_action(state)
Select an action given the current state.
This method encapsulates the agent’s policy. Depending on the implementation, it may be deterministic (e.g., greedy) or stochastic (e.g., epsilon-greedy, softmax, Gaussian).
Parameters
- stateobject
The current state observed from the environment.
Returns
- actionint or float or numpy.ndarray
The action chosen by the agent.
- abstract start(state)
Begin a new episode.
Parameters
- stateobject
The initial state observed from the environment.
Returns
- actionint or float or numpy.ndarray
The first action selected by the agent given the initial state.
- abstract step(reward, state)
Take a step in the environment.
This method is called after the agent receives a reward and the next state from the environment. The agent should update its internal estimates and return the next action.
Parameters
- rewardfloat
The reward received from the previous action.
- stateobject
The new state observed from the environment.
Returns
- actionint or float or numpy.ndarray
The next action chosen by the agent.