Welcome to the first of five short tutorials, guiding you through the process of creating your own PettingZoo environment, from conception to deployment.
欢迎来到五个简短教程中的第一个,这些教程将指导您从概念到部署,创建自己的PettingZoo环境。
其实只有4个教程
We will be creating a parallel environment, meaning that each agent acts simultaneously.
我们将创建一个并行环境,这意味着每个智能体都会同时进行动作。
Before thinking about the environment logic, we should understand the structure of
environment repositories.
在考虑环境逻辑之前,我们应该了解环境存储库的结构。
5.2 Tree Structure
Environment repositories are usually laid out using the following structure:
is where your environment will be stored, along with any helper functions (in the case of a complicated environment).
是存储环境以及任何帮助函数(在复杂环境中)的位置。
1
/custom-environment/custom_environment_v0.py
is a file that imports the environment - we use the file name for environment version control.
是导入环境的文件——我们使用文件名进行环境版本控制。
即在这里面导入自定义环境
1
/requirements.txt
is a file used to keep track of your environment dependencies. At the very least, pettingzoo should be in there. Please version control all your dependencies via “”.
Now that we have a basic understanding of the structure of environment repositories, we can start thinking about the fun part - environment logic!
现在我们已经基本了解了环境存储库的结构,我们可以开始思考有趣的部分——环境逻辑!
For this tutorial, we will be creating a two-player game consisting of a prisoner, trying to escape, and a guard, trying to catch the prisoner. This game will be played on a 7x7 grid, where:
# Check termination conditions terminations = {a: Falsefor a in self.agents} rewards = {a: 0for a in self.agents} if self.prisoner_x == self.guard_x and self.prisoner_y == self.guard_y: rewards = {"prisoner": -1, "guard": 1} terminations = {a: Truefor a in self.agents}
elif self.prisoner_x == self.escape_x and self.prisoner_y == self.escape_y: rewards = {"prisoner": 1, "guard": -1} terminations = {a: Truefor a in self.agents}
# Generate action masks prisoner_action_mask = np.ones(4) if self.prisoner_x == 0: prisoner_action_mask[0] = 0# Block left movement elif self.prisoner_x == 6: prisoner_action_mask[1] = 0# Block right movement if self.prisoner_y == 0: prisoner_action_mask[2] = 0# Block down movement elif self.prisoner_y == 6: prisoner_action_mask[3] = 0# Block up movement
In many environments, it is natural for some actions to be invalid at certain times. For example, in a game of chess, it is impossible to move a pawn forward if it is already at the front of the board. In PettingZoo, we can use action masking to prevent invalid actions from being taken.
Action masking is a more natural way of handling invalid actions than having an action have no effect, which was how we handled bumping into walls in the previous tutorial.
8 (WIP) Creating Environments: Testing Your Environment
8.1 Introduction
Now that our environment is complete, we can test it to make sure it works as intended. PettingZoo has a built-in testing suite that can be used to test your environment.现在我们的环境已经完成,我们可以测试它以确保它按预期工作。PettingZoo有一个内置测试套件,可以用来测试您的环境。
8.2 Code
(add this code below the rest of the code in the file)
(将此代码添加到文件中其余代码的下面)
1
/custom-environment/env/custom_environment.py
1 2 3
from pettingzoo.test import parallel_api_test # noqa: E402 if __name__ == "__main__": parallel_api_test(CustomEnvironment(), num_cycles=1_000_000)