Bandit Agent
- class rlforge.agents.bandit.BanditAgent(num_actions, epsilon=0.1, step_size=None)
Agent for the k-armed bandit problem using epsilon-greedy exploration.
This agent maintains estimates of action values (Q-values) and selects actions according to the epsilon-greedy strategy. After receiving a reward, it updates the estimate of the chosen action using an incremental sample-average method.
Parameters
- num_actionsint
Number of available arms (actions).
- epsilonfloat, optional (default=0.1)
Probability of selecting a random action (exploration).
- step_sizefloat or None, optional
Constant step size for updates. If None, use 1/n incremental average (sample-average method).