Softmax Policy
- class rlforge.policies.softmax(h, temperature=1)
Compute a softmax policy distribution over actions.
The softmax policy assigns probabilities to actions based on their preferences or values. The temperature parameter controls the exploration-exploitation trade-off:
Low temperature (< 1): more greedy, favors high-value actions.
High temperature (> 1): more exploratory, probabilities spread more evenly.
Parameters
- hnumpy.ndarray, shape (n_states, n_actions)
2-D array of action preferences or values. Each row corresponds to a state, and each column corresponds to an action.
- temperaturefloat, optional (default=1)
Scaling factor that adjusts the entropy of the distribution.
Returns
- softmax_probsnumpy.ndarray, shape (n_states, n_actions)
Probability distribution over actions for each state. Rows sum to 1.