Skip to content

Gymnasium Environments

The 3we project provides OpenAI Gymnasium-compatible environments for training and evaluating reinforcement learning agents. These environments wrap the simulation backend and expose standard reset() / step() interfaces.

import gymnasium as gym
import threewe.gym # registers environments
env = gym.make("3we/Navigation-v1")
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
Environment IDDescriptionAction SpaceObservation Space
3we/Navigation-v1Navigate to a goal in a cluttered room.Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, goal)
3we/Exploration-v1Maximize area covered in unknown map.Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, map)
3we/ObjectNav-v1Navigate to an object by category name.Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, object_goal)
3we/VLN-v1Follow natural language navigation instructions.Box(3,) - [vx, vy, omega]Dict(image, lidar, pose, velocity, instruction)

For 3we/Navigation-v1:

obs = {
"image": np.ndarray, # shape (64, 64, 3), uint8, RGB
"lidar": np.ndarray, # shape (360,), float32, range in meters
"pose": np.ndarray, # shape (3,), float32 [x, y, theta]
"velocity": np.ndarray, # shape (3,), float32 [vx, vy, omega]
"goal": np.ndarray, # shape (2,), float32 [goal_x, goal_y]
}

The navigation environment uses a shaped reward:

  • Distance reward: -0.1 * delta_distance_to_goal per step
  • Arrival bonus: +10.0 when within 0.1 m of the goal
  • Collision penalty: -5.0 on contact with an obstacle
  • Time penalty: -0.01 per step to encourage efficiency

Pass configuration via gym.make kwargs:

env = gym.make(
"3we/Navigation-v1",
render_mode="human",
max_episode_steps=500,
room_size=5.0,
num_obstacles=8,
goal_tolerance=0.1,
)

Environments support Gymnasium render modes:

env = gym.make("3we/Navigation-v1", render_mode="human") # opens a window
env = gym.make("3we/Navigation-v1", render_mode="rgb_array") # returns frames

For parallel training with Stable-Baselines3 or similar:

from gymnasium.vector import AsyncVectorEnv
envs = AsyncVectorEnv([
lambda: gym.make("3we/Navigation-v1") for _ in range(8)
])

These environments use the same physics parameters and sensor models as the full simulation, ensuring policies trained here transfer well to hardware. See the Sim-to-Real Transfer guide for domain randomization settings.