Policies

Policies define how an agent selects actions given its current knowledge or preferences. They are the core mechanism that balances exploration (trying new actions to discover rewards) and exploitation (choosing the best-known action to maximize reward). RLForge provides several standard policies that illustrate different approaches to this trade-off:

Epsilon-Greedy — selects the best-known action most of the time, but with a small probability epsilon chooses a random action to encourage exploration.
Softmax — assigns probabilities to actions based on their estimated values, controlled by a temperature parameter that adjusts the balance between greediness and randomness.
Gaussian — samples continuous actions from a normal distribution defined by a mean (mu) and standard deviation (sigma), useful for environments with continuous action spaces.

These policies serve as building blocks for agents in RLForge, allowing you to experiment with different exploration strategies and adapt them to discrete or continuous environments.