Softmax Policy

class rlforge.policies.softmax(h, temperature=1)

Compute a softmax policy distribution over actions.

The softmax policy assigns probabilities to actions based on their preferences or values. The temperature parameter controls the exploration-exploitation trade-off:

Low temperature (< 1): more greedy, favors high-value actions.

High temperature (> 1): more exploratory, probabilities spread more evenly.

Parameters

hnumpy.ndarray, shape (n_states, n_actions): 2-D array of action preferences or values. Each row corresponds to a state, and each column corresponds to an action.
temperaturefloat, optional (default=1): Scaling factor that adjusts the entropy of the distribution.

Returns

softmax_probsnumpy.ndarray, shape (n_states, n_actions): Probability distribution over actions for each state. Rows sum to 1.