AWS DeepRacer
AI & Software
Two-time first place champion in AWS DeepRacer ACP Youth competitions (2024 and 2025), building and tuning custom reinforcement learning models for autonomous racing.
PROJECT OVERVIEW
DETAILED ANALYSIS
PROJECT DETAILS
TECHNICAL SPECIFICATIONS
Why I Competed
I had been interested in reinforcement learning for a while before I discovered AWS DeepRacer. Most of my AI experience up to that point had been in supervised learning — training models on labeled data, building classifiers, working with neural networks that learn from examples. Reinforcement learning is a fundamentally different paradigm: the agent learns by trial and error, receiving rewards for good behavior and penalties for bad behavior, with no labeled dataset to lean on.
AWS DeepRacer gave me a practical arena to explore RL in a way that was both technically rigorous and genuinely fun. The premise is straightforward — build a reinforcement learning model that drives a virtual (and sometimes physical) car around a track as fast as possible without going off course. But the simplicity of the premise hides a deep technical challenge. Getting a car to drive fast is easy. Getting it to drive fast and stay on the track requires carefully designed reward functions, hyperparameter tuning, and a solid understanding of how RL agents actually learn.
I was hooked from my first training run.
What It Is
AWS DeepRacer is Amazon’s autonomous racing platform built on reinforcement learning. Participants design reward functions in Python that define what “good driving” means, then train RL models in the AWS cloud using those reward functions. The trained models drive a 1/18th scale autonomous car around a track, and competitors are ranked by lap time.
The competition uses a Proximal Policy Optimization (PPO) algorithm, which is a policy gradient method that balances exploration and exploitation. As a competitor, you do not directly code the driving behavior — instead, you define the reward signal, and the RL agent figures out the driving strategy through millions of simulated episodes.
The ACP Youth division brings together young competitors who are building and tuning their own models. It is a legitimate test of RL understanding, not just a plug-and-play exercise. The difference between a mediocre model and a winning one comes down to how well you understand the interplay between your reward function, your training hyperparameters, and the specific characteristics of the track.
My Approach
Winning DeepRacer twice required a systematic approach to model development. My process started with track analysis — studying the geometry of each competition track to understand where the critical sections were. Tight turns require different strategies than long straightaways, and a reward function that works well on one track layout might fail on another.
My reward functions evolved significantly between my first and second championship. In the first year, I focused on the basics: rewarding center-line following, penalizing going off-track, and adding speed incentives on straight sections. This got me a competitive model, but I knew I could do better.
For the second year, I developed more sophisticated reward functions that accounted for optimal racing lines — the shortest and fastest path through a turn is not always the center of the track. I incorporated waypoint-based strategies that encouraged the car to take wider entries into turns and clip the inside apex, similar to what real racing drivers do. I also experimented with progressive speed targets that allowed the car to brake into turns and accelerate out of them.
Hyperparameter tuning was equally important. I ran extensive training experiments adjusting the learning rate, discount factor, entropy coefficient, and batch size. I kept detailed logs of each training run and tracked how different configurations affected both lap time and consistency. A model that posts one fast lap but crashes on the next three is less useful than one that completes every lap reliably at a competitive pace.
I also learned the importance of training duration and diminishing returns. There is a point where additional training episodes stop improving performance and can even degrade it through overfitting to the simulated environment. Finding that sweet spot was part of the competitive edge.
Impact
Winning first place in the ACP Youth division in both 2024 and 2025 was a highlight of my competitive programming journey. The back-to-back championships were not just about the results — they represented a deepening understanding of reinforcement learning that I built up over two years of iterating on models, analyzing failures, and refining my approach.
The experience gave me practical expertise in reinforcement learning that goes well beyond what I could have learned from textbooks or online courses. Working within the constraints of DeepRacer’s platform forced me to truly understand how reward shaping influences agent behavior, how hyperparameters affect convergence, and how to debug models that are not performing as expected.
More broadly, DeepRacer taught me the value of systematic experimentation. In RL, you cannot just tweak one thing and hope for the best — you need to change variables methodically, measure results carefully, and build intuition over time. That experimental mindset has carried over into every other technical project I have worked on since.
The autonomous racing domain also sparked my broader interest in robotics and autonomous systems. Seeing an RL agent learn to navigate a track reinforced how powerful the paradigm is, and it made me curious about applying similar techniques to real-world autonomous navigation — an interest that connects directly to my work in FTC robotics and other projects.