SARSA Agent

class rlforge.agents.tabular.sarsa.SarsaAgent(step_size, discount, num_states, num_actions, epsilon=0.1)

Tabular agent implementing the SARSA algorithm.

SARSA (State-Action-Reward-State-Action) is an on-policy temporal difference learning method. Unlike Q-learning, which updates toward the maximum action value in the next state, SARSA updates toward the value of the action actually taken under the current policy. This makes SARSA sensitive to the agent’s exploration strategy.

Notes

  • The agent uses an epsilon-greedy policy for action selection.

  • This implementation does not include planning steps; it directly inherits from BaseAgent.

end(reward)

Complete an episode.

Performs a final update to the Q-value of the last state-action pair using the terminal reward.

Parameters

rewardfloat

The terminal reward received at the end of the episode.

reset()

Reset the agent’s internal state.

Initializes the Q-table to zeros at the start of training or between episodes.

select_action(q_values)

Select an action using epsilon-greedy exploration.

Parameters

q_valuesnumpy.ndarray

Array of Q-values for the current state.

Returns

actionint

The chosen action.

start(new_state)

Begin a new episode.

Selects the first action using the epsilon-greedy policy and stores the initial state-action pair.

Parameters

new_stateint

The initial state observed from the environment.

Returns

actionint

The first action selected by the agent.

step(reward, new_state)

Take a step in the environment.

Updates Q-values using the SARSA update rule, which incorporates the action actually taken in the next state.

Parameters

rewardfloat

Reward received from the previous action.

new_stateint

The new state observed from the environment.

Returns

actionint

The next action chosen by the agent.

Notes

  • The Q-value update follows:

    \[Q(s, a) \leftarrow Q(s, a) + \alpha \Big[ r + \gamma Q(s', a') - Q(s, a) \Big]\]

    where \(\alpha\) is the step size, \(\gamma\) is the discount factor, and \(a'\) is the action actually taken in the next state.