Epsilon Greedy
- class rlforge.policies.epsilonGreedy(q_values, epsilon=0.1)
Select an action using the epsilon-greedy exploration strategy.
With probability epsilon, a random action is chosen (exploration). With probability 1 - epsilon, the action with the highest estimated value in q_values is selected (exploitation).
Parameters
- q_valuesnumpy.ndarray, shape (n_actions,)
1-D array containing the estimated action values for the current state.
- epsilonfloat, optional (default=0.1)
Probability of selecting a random action instead of the greedy one.
Returns
- actionint
Index of the chosen action.