Preface
Review of RL, based on Mathematical Foundations of Reinforcement Learning
1.Basic Concepts
1.1 State
- state: status of agent with respect to environment
- state space: the set of all states $\mathcal{S} = \{ s_i \}^{N}_{i = 1}$
1.2 Action
- action space of the state: the set of all possible action of a state $\mathcal{A}(s_i) = \{a_i\}$
1.3 State transition && state transition probability $p(s^{’}|s,a)$
1.4 Reward && Reward probability $p(r|s,a)$
- Reward is one of the most unique concepts of RL
1.5. Trajectory, episode, return, discounted return
- trajectory: state-action-reward chain
- return: sum of all the rewards collected along the trajectory
different policy gives different trajectory
- discount return: $\gamma \in [0, 1)$
Roles:
- Make sum become finite
- Balance the near && far future rewards:
- $\gamma \rightarrow 0$: discounted return dominated by near future
- $\gamma \rightarrow 1$: discounted return dominated by far future
- episode: a trial, usually assumed to be a finite trajectory.
How to treat episode?
1.6 Markov decision process
- Sets
- State: $\mathcal{S}$
- Action: $\mathcal{A}(s)$ is associated for state $s \in \mathcal{S}$
- Reward: $\mathcal{R}(s, a)$
- Probability distribution:
- State transition probability: at state $s$, take action $a$, the probability to transit to state $s^{’}$ is $p(s^{’}|s,a)$
- Reward probability: at state $s$, take action $a$, the probability to get reward $r$ is $p(r|s,a)$
- Policy: at state $s$, the probability to choose action $a$ is $\pi(a|s)$
- Markov property: memoryless property