Gymnasium Environments

The 3we project provides OpenAI Gymnasium-compatible environments for training and evaluating reinforcement learning agents. These environments wrap the simulation backend and expose standard reset() / step() interfaces.

Quick Start

import gymnasium as gym
import threewe.gym  # registers environments

env = gym.make("3we/Navigation-v1")
obs, info = env.reset()

for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()

env.close()

Available Environments

Environment ID	Description	Action Space	Observation Space
`3we/Navigation-v1`	Navigate to a goal in a cluttered room.	`Box(3,)` - `[vx, vy, omega]`	`Dict(image, lidar, pose, velocity, goal)`
`3we/Exploration-v1`	Maximize area covered in unknown map.	`Box(3,)` - `[vx, vy, omega]`	`Dict(image, lidar, pose, velocity, map)`
`3we/ObjectNav-v1`	Navigate to an object by category name.	`Box(3,)` - `[vx, vy, omega]`	`Dict(image, lidar, pose, velocity, object_goal)`
`3we/VLN-v1`	Follow natural language navigation instructions.	`Box(3,)` - `[vx, vy, omega]`	`Dict(image, lidar, pose, velocity, instruction)`

Observation Details

For 3we/Navigation-v1:

obs = {
    "image": np.ndarray,    # shape (64, 64, 3), uint8, RGB
    "lidar": np.ndarray,    # shape (360,), float32, range in meters
    "pose": np.ndarray,     # shape (3,), float32 [x, y, theta]
    "velocity": np.ndarray, # shape (3,), float32 [vx, vy, omega]
    "goal": np.ndarray,     # shape (2,), float32 [goal_x, goal_y]
}

Reward Structure

The navigation environment uses a shaped reward:

Distance reward: -0.1 * delta_distance_to_goal per step
Arrival bonus: +10.0 when within 0.1 m of the goal
Collision penalty: -5.0 on contact with an obstacle
Time penalty: -0.01 per step to encourage efficiency

Custom Configuration

Pass configuration via gym.make kwargs:

env = gym.make(
    "3we/Navigation-v1",
    render_mode="human",
    max_episode_steps=500,
    room_size=5.0,
    num_obstacles=8,
    goal_tolerance=0.1,
)

Rendering

Environments support Gymnasium render modes:

env = gym.make("3we/Navigation-v1", render_mode="human")    # opens a window
env = gym.make("3we/Navigation-v1", render_mode="rgb_array") # returns frames

Vectorized Environments

For parallel training with Stable-Baselines3 or similar:

from gymnasium.vector import AsyncVectorEnv

envs = AsyncVectorEnv([
    lambda: gym.make("3we/Navigation-v1") for _ in range(8)
])

Sim-to-Real Compatibility

These environments use the same physics parameters and sensor models as the full simulation, ensuring policies trained here transfer well to hardware. See the Sim-to-Real Transfer guide for domain randomization settings.