# Environment and State * The **environment** is an entity that the agent can interact with. It represents the information in the form of **state**. * In RL, **History** or **Trajectory** is the collection of all variables that describe all the events that have taken place between the environment and the agent. ```{math} \mathbb{H}_t \, \text{or} \, \tau_t = \left( O_0, A_0, R_0, O_1, A_1, R_1, ... O_t, A_t, R_t \right) \, \implies S_t = f(H_t) ``` * A state signal that retains all relevant history information is said to have **Markov property**, and such RL tasks are called **Markov decision process**. That is, the environment need only know the current state and action to estimate the transition into the next state. ```{math} \therefore p (s' | s, a) = p ( s' | h, a) ``` ```{note} The future is independent of the past given the present ```{math} \mathbb{H}_{t} \longrightarrow S_t \longrightarrow \mathbb{H}_{t+1} ``` However, the state can be defined for both **Environment** and **Agent**.

The environment state is the environment's internal state.
It may or may not be fully visible to the agent.

The agent state is a function of the history
Full observability is Agent directly observing environment state. [Markov Decision Process (MDP) ]
Partial observability is Agent indirectly observing environment state. [Partially Observable Markov Decision Process (POMDP) ]