<aside> 💡 개요 (lecture.pdf) → 강화학습을 공부하면서 필요한 기초 배경과 지식들에 대해 설명함 → 주요 단어들에 대한 정의 (state, action, reward, observation, model … etc.)

</aside>

Branches of Machine Learning

Untitled

Characteristics of Reinforcement Learning

There is no supervisor, only a reward signal
Feedback is delayed, not instantaneous
Time really matters (sequential, non i.i.d data)
Agent’s actions affect the subsequent data it receives

Rewards

<aside> 💡 Definition (Reward Hypothesis) → All goals can be described by the maximisation of expected cumulative reward

</aside>

A reward $R_t$ is a scalar feedback signal
step $t$ 에서 agent가 얼마나 잘했는지를 나타내는 지표
agent는 cumulative reward를 maximise하는 것

Sequential Decision Making

Goal: select actions to maximise total future reward
Actions may have long term consequense
Reward may be delayed
It may be better to sacrifice immediate reward to gain more long-term reward

Agent and Environment