Epsilon Greedy

class rlforge.policies.epsilonGreedy(q_values, epsilon=0.1)

Select an action using the epsilon-greedy exploration strategy.

With probability epsilon, a random action is chosen (exploration). With probability 1 - epsilon, the action with the highest estimated value in q_values is selected (exploitation).

Parameters

q_valuesnumpy.ndarray, shape (n_actions,)

1-D array containing the estimated action values for the current state.

epsilonfloat, optional (default=0.1)

Probability of selecting a random action instead of the greedy one.

Returns

actionint

Index of the chosen action.