<aside>
💡 개요 (lecture.pdf)
→ 강화학습을 공부하면서 필요한 기초 배경과 지식들에 대해 설명함
→ 주요 단어들에 대한 정의 (state, action, reward, observation, model … etc.)
</aside>
Branches of Machine Learning

Characteristics of Reinforcement Learning
- There is no supervisor, only a reward signal
- Feedback is delayed, not instantaneous
- Time really matters (sequential, non i.i.d data)
- Agent’s actions affect the subsequent data it receives
Rewards
<aside>
💡 Definition (Reward Hypothesis)
→ All goals can be described by the maximisation of expected cumulative reward
</aside>
- A reward $R_t$ is a scalar feedback signal
- step $t$ 에서 agent가 얼마나 잘했는지를 나타내는 지표
- agent는 cumulative reward를 maximise하는 것
Sequential Decision Making
- Goal: select actions to maximise total future reward
- Actions may have long term consequense
- Reward may be delayed
- It may be better to sacrifice immediate reward to gain more long-term reward
Agent and Environment