Physical Intelligence: Science and Systems — University of Pennsylvania

Drone Racing using Reinforcement Learning

PPO Python NVIDIA Isaac Sim Reward Shaping Vision-based Control

This project explores end-to-end autonomous drone racing using Proximal Policy Optimization (PPO) in NVIDIA Isaac Sim. The goal: train a quadrotor agent to navigate through complex racing gates at speed while avoiding obstacles, without any hand-crafted trajectory planning.

Key design decisions:

Vision-based input: the agent perceives the environment through simulated onboard cameras, making the policy directly transferable to hardware.
Reward shaping: a carefully designed reward function encourages aggressive gate traversal while penalizing crashes and excessive oscillation.
Curriculum training: gate complexity and spacing are gradually increased as the agent improves, enabling stable learning on a hard task.

The result demonstrates that deep reinforcement learning can produce high-speed, obstacle-aware flight policies without relying on privileged state information.

Repository