How Tesla Continuously and Automatically Improves Autopilot and Full Self-Driving Capability On 5M+ Cars
Learn how to build a data engine that continuously improves your ML model
Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.
To read more on this topic see the references section at the bottom. Consider sharing this post with someone who wants to know more about machine learning.
“… most of the programming is in the form of these datasets and making sure that they are large, diverse and clean”
“… you are trying to always increase the quality of your dataset, so you are trying to catch scenarios that are basically rare. It is in these scenarios your neural network will typically struggle in because they weren’t told what to do in those rare cases in the dataset but now you can close the loop if you can collect them at scale and feed them back in …”
- Andrej Karpathy with Lex Friedman [4]
We dive into the details of a patent by Tesla’s former Senior Director of AI [3] Andrej Karpathy: “System and method for obtaining training data” [2].
0 . The Model Is As Good As The Data
The key to unlocking a model's true potential lies in the quality of its training data. Imagine a model tasked with detecting objects in images. Its capabilities are inherently limited by the data it's trained on1.
If the dataset only contains three classes – cars, trucks, and buses – we shouldn't be surprised if the model struggles to identify traffic lights.
1. Scoping The Problem: Targeted Data Acquisition
A model’s true test lies in its ability to work correctly on messy, ever-changing real-world data. Unseen data throws even the most sophisticated models off balance. This is where the pursuit of better data becomes crucial.
Tesla has a fleet of hundreds of thousands of cars equipped with sensors. This fleet ingests thousands of hours of data when driving around every day.
But within this ocean of data lies the challenge: finding the proverbial needle in the haystack – those rare, critical moments that can significantly improve the model's performance and avoid fatal accidents such as the one where the model could not distinguish between a white truck and the sky [6, 7].
When a model is deployed in production it will likely see data it has not seen during training. Once Tesla’s ML team discovers a challenging failure mode2, targeted data collection begins. Hard or rare samples mined this way can then be added to the training set. The model can then be re-trained to improve performance.
2. The Solution: Finding The Needle In The Haystack Using Trigger Classifiers
The solution? Trigger classifiers.
Imagine a team of lightweight machine learning models (think support vector machines, tiny neural networks, or random forests) specifically designed to be vigilant. These classifiers aren't general-purpose workhorses; they're trained for a singular mission: to trigger when they encounter a specific criterion within the terabytes of data streaming through the car's machine-learning pipeline.
Preparing the Trigger Classifier Arsenal
Each team spawns its own arsenal of trigger classifiers. These are “linked” to the general-purpose feature extractor backbone. The backbone is a large feature extractor that is trained to extract features important in the context of autonomous driving.
The trigger classifiers are built on top of the feature extractor backbone forming a hydra-like structure [5] with many tiny heads designed to perform specific “jobs” on top of the backbone. At Tesla, each trigger classifier is designed to focus on a specific aspect – shopping carts, unusual road conditions, unique driving patterns, hazards, or even weather anomalies.
During training, a technique called hyperparameter optimization helps determine which layer of the backbone provides the most relevant information for each trigger classifier. Earlier layers, rich in semantic details about the raw data, might be ideal for some classifiers, while others might benefit from the compactness of later layers.
Targeted Data Acquisition on Trigger
When a trigger classifier's score surpasses a predefined threshold, it raises a flag, deeming the data potentially valuable. This flagged data3 is then selectively uploaded to a central server for further scrutiny. Here, a human team reviews the data, labels it appropriately, and incorporates it into the next training run of the model.
Tesla's vast fleet size operating across the globe guarantees a representative dataset for even the rarest scenarios. Imagine the "transparent truck" case. By effectively identifying and capturing these rare instances, the model can be re-trained to not regress in commonly occurring situations but also improve performance on challenging edge cases, like detecting those transparent-walled trucks.
Intelligently Scaling To 100s Of Trigger Classifiers On A Single Vehicle
The beauty of this system lies in its scalability. Multiple teams can deploy their own trigger classifiers on a single vehicle, each searching for their specific needles in the haystack.
One team might be hunting for skateboard-wielding pedestrians, another for highway-crossing raccoons, while another focuses on bizarre traffic signs. This collaborative approach allows Tesla to continuously improve its models by capturing a vast array of real-world situations.
In essence, trigger classifiers act as intelligent filters, sifting through the massive data stream and identifying long-tailed data that hold the key to unlocking the true potential of Tesla's self-driving cars.
3. Trigger Classifier Design Choices
Tesla's choice to use lightweight trigger classifiers is a strategic one, driven by three key factors:
Resource Efficiency: Deploying numerous trigger classifiers simultaneously would quickly drain resources on the car's computer. By keeping them lightweight, Tesla ensures they can run smoothly without slowing the system.
Leveraging Pre-extracted Features: The trigger classifiers don't have to start from scratch. They benefit from a powerful "backbone" model that has already processed the raw sensor data and extracted key features by training to perform object detection, semantic segmentation, lane detection, activity recognition, etc.
The light classifier then operates on a very feature-rich input so it does not need to re-do the work of extracting low-level concepts such as person, road, sky, trees, etc. Think of it like a chef receiving pre-chopped vegetables, allowing them to focus on crafting the final dish – the classification task – without wasting time on prep work.Training With Small Datasets for Trigger Events: After discovering the model struggles with transparent trucks, Tesla can gather a small set of images specifically showcasing these trucks. This focused dataset allows for efficient training of the lightweight classifier, enabling it to flag similar situations in the future. The classifier essentially learns to recognize a specific pattern, like a transparent truck, within the broader feature-rich data provided by the backbone model.
4. Putting the Search Teams to Work: Smart Deployment of Trigger Classifiers
With our search teams (trigger classifiers) trained and ready, how are they deployed on Tesla's self-driving cars? Efficiency is key here. Tesla's clever on-board machine learning system can swap trigger classifiers in and out as needed. Imagine having a large team of specialists, but only needing half of them working at any given time.
To optimize this process, Tesla takes advantage of several factors:
Multi-modal Information As Input: The system considers various data points, like the car's location, outputs from other machine learning models, and sensor data from radar and other sources. This combined knowledge helps it decide which trigger classifiers are most relevant at any given moment.
Context-Aware Activation: For instance, if a classifier is trained to identify tunnel exits, the system knows where tunnels are located on digital maps. As the car approaches a tunnel, the system activates the "tunnel exit" classifier, ensuring it runs during the critical time window. This targeted approach significantly increases the chances of capturing relevant data efficiently.
Shadow Mode Operations: In many cases, even when a human is driving, the car's machine learning system operates in shadow mode4. This means it continues to analyze data and potentially trigger classifiers in the background, without actually controlling the vehicle. This allows the system to constantly learn and gather valuable data, even during manual driving phases.
By intelligently deploying trigger classifiers and leveraging multi-modal data, Tesla ensures its search teams are focused on the right tasks at the right times. This maximizes data collection efficiency and helps unearth those crucial hidden gems that propel the development of ever-better self-driving car models.
5. Continuously Improving Trigger Classifiers: Smaller Loops Inside the Big Loop
The beauty of Tesla's system lies not only in identifying valuable data but also in its ability to continuously improve the search itself. Just like any team of specialists, these trigger classifiers can be refined over time.
As new data is collected and labeled, Tesla can feed it back into the trigger classifiers for retraining. This iterative process hones their ability to find needles in the haystack with ever-greater precision. A key advantage here is that trigger classifiers operate independently of the core autonomous driving system. This allows for more frequent updates without impacting the car's critical functionalities. In essence, Tesla can deploy a new trigger classifier and begin targeted data collection within a matter of minutes.
Outro and The Broader Picture
Tesla's trigger classifiers pave the way for the development of ever-more-capable self-driving cars. This iterative cycle of data collection, refinement, and redeployment allows models to continuously learn and adapt to the ever-changing world, bringing us closer to safe and reliable autonomous vehicles.
The impact of this approach extends far beyond self-driving cars. The fundamental challenge of identifying valuable data from an infinite bucket of information applies to many similar scenarios, particularly in robotics. Just like self-driving cars, robots often rely on vast amounts of streaming data to perform their tasks. Struggling to find and curate the right data hinders their development. Trigger classifiers, or similar techniques, hold immense potential for robotic applications as well. By efficiently identifying those critical moments within the data stream, robots can learn from their experiences in the real world and continuously improve their performance – a significant leap forward in the field of robotics. Tesla's trigger classifiers offer a glimpse into a future where machine learning empowers machines to learn and adapt in real time, shaping the landscape of not just self-driving cars, but the entire robotics industry.
Consider subscribing to get it straight into your mailbox:
Continue reading more:
References
[1] Job Shadowing: https://en.wikipedia.org/wiki/Job_shadow
[2] System and method for obtaining training data: https://patents.google.com/patent/EP3850549A1/en
[3] Andrej Karpathy: https://karpathy.ai/
[4] Lex Clips: youtube.com/watch?v=zPH5O8hRfMA
[5] HydraNets: youtu.be/j0z4FweCy4M?t=3241
[6] https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk
Consider sharing this newsletter with somebody who wants to learn about machine learning:
Consider a dataset with numerous unlabeled cars, meaning the cars are present in the images but haven't been marked for detection. The model might accurately predict the presence of these cars, but the training algorithm will penalize these predictions due to the missing labels. This confusion can lead to a model that lacks confidence in its predictions. Similarly, mislabeled data, like a bus incorrectly labeled as a truck, introduces noise that hinders the model's performance. In essence, “dirty” data means underperforming models.
These can be images that contain “edge cases” or images from the long-tail distribution of the data e.g. an image of a truck with transparent walls, or a transport vehicle where the color of the vehicle matches exactly the color of the sky making the truck look “invisible” to the model.
Depending on the application, in some deployments, additional metadata such as the type or road, left or right-hand drive, time of the day, duration since the classifier was triggered last, and vehicle parameters (speed, acceleration, steering angle) are also retained.
It involves a potential candidate to follow and see how a person performs a certain job. In machine learning for medicine, models are deployed in shadow mode with the doctors. This is then used to evaluate how effective and correct the models would be if they were deployed instead of the doctor. In Tesla’s case, this means the ML system is on but its outputs are not used to autonomously control the vehicle.