Bandit Agent

class rlforge.agents.bandit.BanditAgent(num_actions, epsilon=0.1, step_size=None)

Agent for the k-armed bandit problem using epsilon-greedy exploration.

This agent maintains estimates of action values (Q-values) and selects actions according to the epsilon-greedy strategy. After receiving a reward, it updates the estimate of the chosen action using an incremental sample-average method.

Parameters

num_actionsint

Number of available arms (actions).

epsilonfloat, optional (default=0.1)

Probability of selecting a random action (exploration).

step_sizefloat or None, optional

Constant step size for updates. If None, use 1/n incremental average (sample-average method).

select_action()

Choose an action using epsilon-greedy exploration.

Returns

actionint

Index of the chosen action.

update(action, reward)

Update the estimated value of the chosen action.

Parameters

actionint

Index of the action taken.

rewardfloat

Reward received from the environment.