Softmax Policy

class rlforge.policies.softmax(h, temperature=1)

Compute a softmax policy distribution over actions.

The softmax policy assigns probabilities to actions based on their preferences or values. The temperature parameter controls the exploration-exploitation trade-off:

  • Low temperature (< 1): more greedy, favors high-value actions.

  • High temperature (> 1): more exploratory, probabilities spread more evenly.

Parameters

hnumpy.ndarray, shape (n_states, n_actions)

2-D array of action preferences or values. Each row corresponds to a state, and each column corresponds to an action.

temperaturefloat, optional (default=1)

Scaling factor that adjusts the entropy of the distribution.

Returns

softmax_probsnumpy.ndarray, shape (n_states, n_actions)

Probability distribution over actions for each state. Rows sum to 1.