<aside> 💡 개요 (lecture.pdf) → 강화학습의 가장 기본이라 할수 있는 MDP에 대해서 설명함 * Markov Processes, MRPs, MDPs … → Value function * State-value function, Action-value function * Bellman Expectation Equation * Bellman Optimality Equation

</aside>

Introduction to MDPs

Markov Property

<aside> 💡 Definition

→ The future is independent of the past given the present

$$ \mathbb{P}[S_{t+1}|S_t]=\mathbb[S_{t+1}|S_1,...,S_t] $$

</aside>

State Transition Matrix

State transition probability

$$ \mathcal{P}{ss^\prime}=\mathbb{P}[S{t+1}=s^\prime|S_t=s] $$

State transition matrix

$$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ to\\ \mathcal{P}= from\ \ \begin{bmatrix} \mathcal{P}{11} & ... & \mathcal{P}{1n}\\ & ... &\\ \mathcal{P}{n1} & ... & \mathcal{P}{nn} \end{bmatrix} $$

Markov Process

<aside> 💡 Definition → A Markov Process (or Markov Chain) is a tuple $\langle\mathcal{S},\mathcal{P}\rangle$ * $\mathcal{S}$ is a (finite) set of states * $\mathcal{P}$ is a state transition probability matrix

$$ \mathcal{P}{ss^\prime}=\mathbb{P}[S{t+1}=s^\prime|S_t=s] $$

</aside>

Markov Reward Process (MRP)