Untitled

→ 아래의 목차는 정리의 용이성을 위해 재구성한 것이며 실제 논문의 목차와 같지 않습니다.

1. Introduction

Multi-agent PPO
Low hardware resource requirement
- Only utilize a single desktop machine with 1 GPU and 1 multicore CPU for training
SOTA for 3 tasks
- Multi-agent particel-world environment (MPE tasks)
- Starcraft multi-agent challenge (SMAC)
- full-scale Hanabi game

2. Multi-Agent PPO (MAPPO)

Untitled

Untitled

Untitled

2.1. Summary

2.1.1. Notation

Decentralized Partially Observable Markov Decision Processes (DEC-POMDP)

$$ <S,A, O, R, n, \gamma> $$

$S$: state space.
$A$: the shared action space for each agent.
$o_i=O(s;i)$: the local observation for agent $i$ at global state $s$.