Preface

Review of RL, based on Mathematical Foundations of Reinforcement Learning

1.Basic Concepts

1.1 State

  • state: status of agent with respect to environment
  • state space: the set of all states $\mathcal{S} = \{ s_i \}^{N}_{i = 1}$

1.2 Action

  • action space of the state: the set of all possible action of a state $\mathcal{A}(s_i) = \{a_i\}$

1.3 State transition && state transition probability $p(s^{’}|s,a)$

1.4 Reward && Reward probability $p(r|s,a)$

  • Reward is one of the most unique concepts of RL

1.5. Trajectory, episode, return, discounted return

  • trajectory: state-action-reward chain
  • return: sum of all the rewards collected along the trajectory

different policy gives different trajectory

  • discount return: $\gamma \in [0, 1)$

Roles:

  1. Make sum become finite
  2. Balance the near && far future rewards:
    1. $\gamma \rightarrow 0$: discounted return dominated by near future
    2. $\gamma \rightarrow 1$: discounted return dominated by far future
  • episode: a trial, usually assumed to be a finite trajectory.

How to treat episode?

1.6 Markov decision process

  • Sets
    • State: $\mathcal{S}$
    • Action: $\mathcal{A}(s)$ is associated for state $s \in \mathcal{S}$
    • Reward: $\mathcal{R}(s, a)$
  • Probability distribution:
    • State transition probability: at state $s$, take action $a$, the probability to transit to state $s^{’}$ is $p(s^{’}|s,a)$
    • Reward probability: at state $s$, take action $a$, the probability to get reward $r$ is $p(r|s,a)$
  • Policy: at state $s$, the probability to choose action $a$ is $\pi(a|s)$
  • Markov property: memoryless property