# Action and Policy
A **policy** ![\pi](https://latex.codecogs.com/svg.latex?\pi) is the agent's internal strategy on picking actions. Moreover, an **action** is the agent's degree of freedom to act for maximizing the reward.

Policy maps states to actions ![\pi : S \longrightarrow \A](https://latex.codecogs.com/svg.latex?\pi\text{:}S\longrightarrow%20A). Also, there are the following approaches for training this function to find the optimal policy.

<ol>
<li> <strong><em>Policy-based</em></strong>: Directly train the policy.

```{math}
a = \pi ( s ) \, \text{or} \, \substack{argmax \\ a} \, \pi ( a | s )
```

<li> <strong><em>Value-based</em></strong>: Train a value function, such that our policy is going to the state with the highest value.

```{math}
\textbf{State-Value Function:} V_{\pi} (s) = \mathbb{E}_{\pi} [ G_t | s], 
```

```{math}
\textbf{Action-Value Function:} Q_{\pi} (s, a) = \mathbb{E}_{\pi} [ G_t | s, a]
```

<li> <strong><em>Model-based</em></strong>