Bandits
- class rlforge.environments.bandits.Bandits(k=10, mean_rewards=None, reward_std=1.0)
Basic k-armed bandit environment.
The k-armed bandit problem is a fundamental reinforcement learning setting where an agent repeatedly chooses among k actions (“arms”), each associated with an unknown reward distribution. The agent’s objective is to maximize cumulative reward by balancing exploration and exploitation.
Parameters
- kint
Number of arms (actions).
- mean_rewardsarray-like, optional
True mean reward for each arm. If None, sampled from N(0,1).
- reward_stdfloat, optional
Standard deviation of reward noise (default: 1.0).
Attributes
- kint
Number of arms.
- mean_rewardsnumpy.ndarray
True mean reward for each arm.
- reward_stdfloat
Standard deviation of reward noise.