Bandit Agent

class rlforge.agents.bandit.BanditAgent(num_actions, epsilon=0.1, step_size=None)

Agent for the k-armed bandit problem using epsilon-greedy exploration.

This agent maintains estimates of action values (Q-values) and selects actions according to the epsilon-greedy strategy. After receiving a reward, it updates the estimate of the chosen action using an incremental sample-average method.

Parameters

num_actionsint: Number of available arms (actions).
epsilonfloat, optional (default=0.1): Probability of selecting a random action (exploration).
step_sizefloat or None, optional: Constant step size for updates. If None, use 1/n incremental average (sample-average method).

select_action()

Choose an action using epsilon-greedy exploration.

Returns

actionint: Index of the chosen action.

update(action, reward)

Update the estimated value of the chosen action.

Parameters

actionint: Index of the action taken.
rewardfloat: Reward received from the environment.