# Environment and State

* The **environment** is an entity that the agent can interact with. It represents the information in the form of **state**.

* In RL, **History** or **Trajectory** is the collection of all variables that describe all the events that have taken place between the environment and the agent.

```{math}
\mathbb{H}_t \, \text{or} \, \tau_t = \left( O_0, A_0, R_0, O_1, A_1, R_1, ... O_t, A_t, R_t  \right) \, \implies S_t = f(H_t) 
```

* A <i>state signal</i> that retains all relevant <i>history</i> information is said to have **Markov property**, and such RL tasks are called **Markov decision process**. That is, the <i>environment</i> need only know the <i>current state</i> and <i>action</i> to estimate the transition into the <i>next state</i>.

```{math}
\therefore p (s' | s, a) = p ( s' | h, a)
```

```{note}
<strong><em>The future is independent of the past given the present</em></strong>
```{math}
\mathbb{H}_{t} \longrightarrow S_t \longrightarrow \mathbb{H}_{t+1}
```

However, the <i>state</i> can be defined for both **Environment** and **Agent**.

<ul>
    <li> The <b>environment state</b> is the environment's internal state.
    <li> It may or may not be fully visible to the agent.
</ul>
<br>
<ul>
    <li> The <b>agent state</b> is a function of the history
    <li> <b>Full observability</b> is Agent directly observing environment state. <b>[<i>Markov Decision Process</i> (MDP) ]</b>
    <li> <b>Partial observability</b> is Agent indirectly observing environment state. <b>[<i>Partially Observable Markov Decision Process</i> (POMDP) ]</b>
<ul>
