跳转到内容

Gymnasium 环境

3we 项目提供兼容 OpenAI Gymnasium 的环境,用于训练和评估强化学习智能体。这些环境封装了仿真后端,并暴露标准的 reset() / step() 接口。

import gymnasium as gym
import threewe.gym # registers environments
env = gym.make("3we/Navigation-v1")
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
环境 ID描述动作空间观测空间
3we/Navigation-v1在杂乱房间中导航到目标。Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, goal)
3we/Exploration-v1在未知地图中最大化覆盖面积。Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, map)
3we/ObjectNav-v1按类别名称导航到目标物体。Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, object_goal)
3we/VLN-v1遵循自然语言导航指令。Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, instruction)

对于 3we/Navigation-v1

obs = {
"image": np.ndarray, # shape (64, 64, 3), uint8, RGB
"lidar": np.ndarray, # shape (360,), float32, range in meters
"pose": np.ndarray, # shape (3,), float32 [x, y, theta]
"velocity": np.ndarray, # shape (3,), float32 [vx, vy, omega]
"goal": np.ndarray, # shape (2,), float32 [goal_x, goal_y]
}

导航环境使用形状化奖励:

  • 距离奖励:每步 -0.1 * delta_distance_to_goal
  • 到达奖励:距目标 0.1 米内时 +10.0
  • 碰撞惩罚:与障碍物接触时 -5.0
  • 时间惩罚:每步 -0.01,鼓励高效

通过 gym.make 的关键字参数传递配置:

env = gym.make(
"3we/Navigation-v1",
render_mode="human",
max_episode_steps=500,
room_size=5.0,
num_obstacles=8,
goal_tolerance=0.1,
)

环境支持 Gymnasium 渲染模式:

env = gym.make("3we/Navigation-v1", render_mode="human") # opens a window
env = gym.make("3we/Navigation-v1", render_mode="rgb_array") # returns frames

用于 Stable-Baselines3 或类似框架的并行训练:

from gymnasium.vector import AsyncVectorEnv
envs = AsyncVectorEnv([
lambda: gym.make("3we/Navigation-v1") for _ in range(8)
])

这些环境使用与完整仿真相同的物理参数和传感器模型,确保在此训练的策略能良好地迁移到硬件。有关域随机化设置,请参阅仿真到真实迁移指南。