
→ 아래의 목차는 정리의 용이성을 위해 재구성한 것이며 실제 논문의 목차와 같지 않습니다.
1. Introduction
- Multi-agent PPO
- Low hardware resource requirement
- Only utilize a single desktop machine with 1 GPU and 1 multicore CPU for training
- SOTA for 3 tasks
- Multi-agent particel-world environment (MPE tasks)
- Starcraft multi-agent challenge (SMAC)
- full-scale Hanabi game
2. Multi-Agent PPO (MAPPO)



2.1. Summary
2.1.1. Notation
- Decentralized Partially Observable Markov Decision Processes (DEC-POMDP)
$$
<S,A, O, R, n, \gamma>
$$
- $S$: state space.
- $A$: the shared action space for each agent.
- $o_i=O(s;i)$: the local observation for agent $i$ at global state $s$.