Preface
1 AEC API
1.1 Example Usage
1 | from pettingzoo.classic import chess_v5 |
1.2 AEC Env
1 | class pettingzoo.utils.env.AECEnv |
The AECEnv steps agents one at a time.
AEC环境一次单步执行一个智能体。
If you are unsure if you have implemented a AECEnv correctly, try running the api_test documented in the Developer documentation on the website.
如果您不确定是否正确实现了AEC环境,请尝试运行网站上开发人员文档中记录的api_test。
1.3 Attributes
1 | AECEnv.agents:List[str] |
1 | AECEnv.num_agents |
1 | AECEnv.possible_agents:List[str] |
1 | AECEnv.max_num_agents |
1 | AECEnv.agent_selection:str |
1 | AECEnv.terminations:Dict[str,bool] |
1 | # reward 结构 |
1 | AECEnv.infos:Dict[str,Dict[str,Any]] |
1 | AECEnv.observation_spaces:Dict[str,Space] |
1 | AECEnv.action_spaces:Dict[str,Space] |
1.4 Methods
1 | AECEnv.step(action:ActionType)→None |
Accepts and executes the action of the current agent_selection in the environment.
接受并执行环境中当前agent_selection的操作。
Automatically switches control to the next agent.
自动将控制切换到下一个智能体。
->None
:建议返回值为None
- “自动将控制切换到下一个智能体”:实际使用某游戏环境时,若要实现“切换到下一个智能体”,实际上靠的是
for agent in env.agent_iter(1000):
这个for循环。step()
本身并不能实现“自动将控制切换到下一个智能体”。 - 注意:这和并行环境不同。这里的step()返回值为None,而并行环境中的step()返回的是(observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary)。
1 | last(observe=True) |
1 | AECEnv.reset(seed:int|None=None,options:dict|None=None)→None |
- 注意:这里的AEC环境(串行环境)的reset()建议返回None,而下文的并行环境的reset()则是返回所有智能体的观测。
1 | AECEnv.observe(agent:str)→ObsType|None |
Returns the observation an agent currently can make.
返回智能体当前可以进行的观察。
1 | AECEnv.render()→None|np.ndarray|str|list |
1 | AECEnv.seed(seed:int|None=None)→None |
1 | AECEnv.close() |
上述的属性和方法都是在编写自定义串行环境中必须要写的,必不可少。而且不要给这些属性和方法另外起名字——可以说,若另外起名,则不合规范,且会导致无法预知的错误。因为接口里的很多内置函数都只认特定的名字。
2 Parellel API
In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via
.parallel_env() . This API is based around the paradigm of Partially Observable Stochastic Games (POSGs) and the details are similar to RLLib’s MultiAgent environment specification, except we allow for different observation and action spaces between the agents.除了主API之外,我们还有一个次要的并行API,用于所有智能体都具有同步动作和观察的环境。可以通过以下方式创建并行API支持的环境:
.parallel_env() 。此API基于部分可观测随机游戏(POSGs)的范式,细节相似于RLLib的多智能体环境规范(RLLib’s MultiAgent environment specification)(链接:https://docs.ray.io/en/latest/rllib/rllib-env.html#multi-agent-and-hierarchical),不同之处在于:我们允许智能体之间有不同的观察和行动空间。
- 这里提到的RLLib的多智能体环境规范,对应于一个叫做“Ray”的开源软件包,官方文档:https://docs.ray.io/en/latest/index.html
- Ray是一个开源的统一框架,用于扩展AI和Python应用程序,如机器学习。它为并行处理提供了计算层,因此您无需成为分布式系统专家。Ray能通过内部的若干组件,将运行分布式个人和端到端机器学习工作流的复杂性降至最低。
2.1 Example Usage
1 | from pettingzoo.butterfly import pistonball_v6 |
2.2 Parallel Env
1 | classpettingzoo.utils.env.ParallelEnv |
Parallel environment class.
并行环境类。
It steps every live agent at once. If you are unsure if you have implemented a ParallelEnv correctly, try running the parallel_api_test in the Developer documentation on the website.
它一步一步跟踪每一个存活的智能体。如果您不确定是否正确实现了ParallelEnv,请尝试运行parallel_api_test(在网站上的开发人员文档中)。
1 | agents |
1 | num_agents |
1 | possible_agents |
1 | max_num_agents |
1 | observation_spaces |
A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.
每个智能体的观察空间的字典,由名称键入。这不能通过播放或重置进行更改。
TYPE:
Dict[AgentID, gym.spaces.Space]
1 | action_space |
A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.
每个智能体的动作空间的字典,由名称键入。这不能通过播放或重置进行更改。
TYPE:
Dict[AgentID, gym.spaces.Space]
1 | step(actions:Dict[str,ActionType])→Tuple[Dict[str,ObsType],Dict[str,float],Dict[str,bool],Dict[str,bool],Dict[str,dict]] |
Receives a dictionary of actions keyed by the agent name.
接收由智能体名称键入的动作字典。
Returns the observation dictionary, reward dictionary, terminated dictionary, truncated
dictionary and info dictionary, where each dictionary is keyed by the agent.
返回观察字典、奖励字典、终止字典、截断字典和信息字典,其中每个字典由智能体键入。
1 | reset(seed:int|None=None,options:dict|None=None)→Dict[str,ObsType] |
1 | seed(seed=None) |
1 | render()→None|np.ndarray|str|List |
1 | close() |
1 | state()->ndarray |
1 | observation_space(agent:str)→Space |
Takes in agent and returns the observation space for that agent.
输入智能体并返回该智能体的观察空间。
MUST return the same value for the same agent name.
必须为相同的智能体名称返回相同的值。
Default implementation is to return the observation_spaces dict
默认实现是返回observation_spaces dict
1 | action_space(agent:str)→Space |
Takes in agent and returns the action space for that agent.
接收智能体并返回该智能体的操作空间。
MUST return the same value for the same agent name.
必须为相同的智能体名称返回相同的值。
Default implementation is to return the action_spaces dict
默认实现是返回action_spaces dict
总结
并行环境和串行环境的大多数属性和方法都是相同的,最大的不同在于:
step()
方法last()
方法
3 PettingZoo Wrappers
3.1 Conversion Wrappers
3.1.1 AEC to Parallel
- 从AEC转换为Parallel
An environment can be converted from an AEC environment to a parallel environment with the to_parallel wrapper shown below. Note that this wrapper makes the following assumptions about the underlying environment:
可以使用下面展示的to_parallel包装器将环境从AEC环境转换为并行环境。请注意,此包装器对底层环境进行了以下假设:
1.The environment steps in a cycle, i.e. it steps through every live agent in order.
1.环境是一个循环,即按顺序遍历每个活动智能体。
2.The environment does not update the observations of the agents except at the end of a cycle.
2.环境不会更新智能体的观察结果,除非在循环结束时。
- 即:用串行环境(AEC环境)编写的自定义环境里,必须在每个时间步完成之后,再更新各个智能体的观测结果
Most parallel environments in PettingZoo only allocate rewards at the end of a cycle. In these environments, the reward scheme of the AEC API an the parallel API is equivalent. If an AEC environment does allocate rewards within a cycle, then the rewards will be allocated at different timesteps in the AEC environment an the Parallel environment. In particular, the AEC environment will allocate all rewards from one time the agent steps to the next time, while the Parallel environment will allocate all rewards from when the first agent stepped to the last agent stepped.
PettingZoo中的大多数并行环境仅在周期结束时分配奖励。在这些环境中,AEC API和并行API的奖励方案是等效的。如果AEC环境确实在一个周期内分配奖励,那么奖励将在AEC环境和并行环境中的不同时间步分配。特别是,AEC环境将分配智能体从一次进入下一次的所有奖励,而并行环境将分配从第一个代理进入最后一个代理的所有奖励。
1 | from pettingzoo.utils.conversions import to_parallel |
3.1.2 Parallel to AEC
- 即:从Parallel转换到AEC
Any parallel environment can be efficiently converted to an AEC environment with the from_parallel wrapper.
使用from_parallel包装器,任何并行环境都可以有效地转换为AEC环境。
1 | from pettingzoo.utils import from_parallel |
3.2 Utility Wrappers
We wanted our pettingzoo environments to be both easy to use and easy to implement. To combine these, we have a set of simple wrappers which provide input validation and other convenient reusable logic.
我们希望我们的pettingzoo环境既易于使用,又易于实施。为了组合这些,我们有一组简单的包装器,它们提供输入验证和其他方便的可重用逻辑。
1 | classpettingzoo.utils.wrappers.BaseWrapper(env) |
Creates a wrapper around env parameter.
围绕env参数创建包装器。
All AECEnv wrappers should inherit from this base class.
所有AECEnv包装器都应继承自该基类。
1 | classpettingzoo.utils.wrappers.TerminateIllegalWrapper(env,illegal_reward) |
This wrapper terminates the game with the current player losing in case of illegal values.
如果值非法,此包装器将终止游戏,当前玩家将输掉游戏。
PARAMETERS: 参数 :
illegal_reward– number that is the value of the player making an illegal move.
illegal_reward–数字,是玩家非法移动的值。
1 | classpettingzoo.utils.wrappers.CaptureStdoutWrapper(env) |
Takes an environment which prints to terminal, and gives it an ansi render mode where it captures the terminal output and returns it as a string instead.
获取打印到终端的环境,并为其提供ansi渲染模式,它捕获终端输出并将其作为字符串返回。
1 | classpettingzoo.utils.wrappers.AssertOutOfBoundsWrapper(env) |
Asserts if the action given to step is outside of the action space. Applied in PettingZoo environments with discrete action spaces.
断言给定给步骤的操作是否在操作空间之外。应用于具有离散动作空间的PettingZoo环境。
1 | classpettingzoo.utils.wrappers.ClipOutOfBoundsWrapper(env) |
Clips the input action to fit in the continuous action space (emitting a warning if it does so).
剪裁输入动作以适应连续动作空间(如果这样做,则发出警告)。
Applied to continuous environments in pettingzoo.
适用于pettingzoo的连续环境。
- 即:如果输入的动作超过了合法范围,则裁剪到合法范围内
1 | classpettingzoo.utils.wrappers.OrderEnforcingWrapper(env) |
Checks if function calls or attribute access are in a disallowed order.
检查函数调用或属性访问是否按禁止的顺序进行。
1.error on getting rewards, terminations, truncations, infos, agent_selection before reset.
1.在重置之前获取奖励、终止、截断、信息、agent_selection,则会报错。
2.error on calling step, observe before reset.
2.在重置之前调用step、observe,则会报错。
3.error on iterating without stepping or resetting environment.
3.在没有步进(step)或重置(reset)环境的情况下迭代,则会报错。
4.warn on calling close before render or reset.
4.在渲染或重置之前调用close,则发出警告。
5.warn on calling step after environment is terminated or truncated
5.在终止或截断环境后调用step,则发出警告。
You can apply these wrappers to your environment in a similar manner to the below example:
您可以按照与以下示例类似的方式将这些包装应用于您的环境:
1 | from pettingzoo.utils import OrderEnforcingWrapper |
4 Supersuit Wrappers
PettingZoo include wrappers via the SuperSuit companion package (pip install supersuit). These can be applied to both AECEnv and ParallelEnv environments. Using it to convert space invaders to have a grey scale observation space and stack the last 4 frames looks like:
PettingZoo通过SuperSuit配套包提供包装(pip install supersuit)。这些可以应用于AECEnv和ParallelEnv环境。使用它将空间入侵者转换为具有灰度观察空间,并堆叠最后4帧,如下所示:
1 | import gymnasium as gym |
Similarly, using SuperSuit with PettingZoo environments looks like:
类似地,在PettingZoo环境中使用SuperSuit看起来像:
1 | from pettingzoo.butterfly import pistonball_v0 |
5 Shimmy Compatibility Wrappers
The Shimmy package (pip install shimmy) allows commonly used external reinforcement learning environments to be used with PettingZoo and Gymnasium.
这个Shimmy工具包(pip install shimmy)允许与PettingZoo和Gymnasium一起使用常用的外部强化学习环境。