Gymnasium 环境
3we 项目提供兼容 OpenAI Gymnasium 的环境,用于训练和评估强化学习智能体。这些环境封装了仿真后端,并暴露标准的 reset() / step() 接口。
import gymnasium as gymimport threewe.gym # registers environments
env = gym.make("3we/Navigation-v1")obs, info = env.reset()
for _ in range(1000): action = env.action_space.sample() obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: obs, info = env.reset()
env.close()| 环境 ID | 描述 | 动作空间 | 观测空间 |
|---|---|---|---|
3we/Navigation-v1 | 在杂乱房间中导航到目标。 | Box(3,) - [vx, vy, omega] | Dict(image, lidar, pose, velocity, goal) |
3we/Exploration-v1 | 在未知地图中最大化覆盖面积。 | Box(3,) - [vx, vy, omega] | Dict(image, lidar, pose, velocity, map) |
3we/ObjectNav-v1 | 按类别名称导航到目标物体。 | Box(3,) - [vx, vy, omega] | Dict(image, lidar, pose, velocity, object_goal) |
3we/VLN-v1 | 遵循自然语言导航指令。 | Box(3,) - [vx, vy, omega] | Dict(image, lidar, pose, velocity, instruction) |
对于 3we/Navigation-v1:
obs = { "image": np.ndarray, # shape (64, 64, 3), uint8, RGB "lidar": np.ndarray, # shape (360,), float32, range in meters "pose": np.ndarray, # shape (3,), float32 [x, y, theta] "velocity": np.ndarray, # shape (3,), float32 [vx, vy, omega] "goal": np.ndarray, # shape (2,), float32 [goal_x, goal_y]}导航环境使用形状化奖励:
- 距离奖励:每步
-0.1 * delta_distance_to_goal - 到达奖励:距目标 0.1 米内时
+10.0 - 碰撞惩罚:与障碍物接触时
-5.0 - 时间惩罚:每步
-0.01,鼓励高效
通过 gym.make 的关键字参数传递配置:
env = gym.make( "3we/Navigation-v1", render_mode="human", max_episode_steps=500, room_size=5.0, num_obstacles=8, goal_tolerance=0.1,)环境支持 Gymnasium 渲染模式:
env = gym.make("3we/Navigation-v1", render_mode="human") # opens a windowenv = gym.make("3we/Navigation-v1", render_mode="rgb_array") # returns frames用于 Stable-Baselines3 或类似框架的并行训练:
from gymnasium.vector import AsyncVectorEnv
envs = AsyncVectorEnv([ lambda: gym.make("3we/Navigation-v1") for _ in range(8)])仿真到真实兼容性
Section titled “仿真到真实兼容性”这些环境使用与完整仿真相同的物理参数和传感器模型,确保在此训练的策略能良好地迁移到硬件。有关域随机化设置,请参阅仿真到真实迁移指南。