Submit a build

How Do Self Driving Cars Work?

Ben Parker

How the Voyage G2 Car sees the world

Credit Voyage

Article Highlights

It’s almost been a decade since the Tesla Model S was being delivered to eager buyers. It really still feels like it was yesterday. But around this time was when I first remember anyone ever really thinking about a possible future of cars that weren’t merely gas-powered, human-driven vehicles. Now a decade later, it feels like we’re minutes away from an autonomous reality. 

We can all remember the new fears of self-driving cars. Some of those fears were realized when people abused their Tesla self-driving abilities and figuratively and literally lost minds heads with it. Maybe you’ve been blessed with the opportunity to experience Tesla autopilot or at least some fancy lane-assist feature. It’s an amazing, frightening, and wholly awesome experience. The moment when you see the steering wheel move on its own, you almost expect to be led to your death by this self-actualized robot car. A lot of questions have been asked, some answered, some I’m not sure will ever be about self-driving vehicles. One question we have answers to, is how do self-driving cars actually work.

It boils down to three main aspects we’ll call sensing, planning, and acting. Your car recognizes surroundings using computer vision and sensor fusion, then it goes through a localization and path planning process before finally controlling the car to go where it needs to. 


To recognize the world around the car, the vehicle is equipped with cameras that give it a collective 360-degree vision. Constantly looking around the whole vehicle the onboard computer uses neural networks to identify the car next to you, the approaching stop sign, the old man crossing the street, or the kid chasing his soccer ball into the road. The cameras are really designed only to get the minimum required information they need for recognition, meaning they can’t see detail. This is where Lidar (light detection and ranging) comes into play, Lidar alongside lasers supplement the camera vision and give a very precise view of the vehicle’s surroundings. Now the whole image has been made.

Planning & Acting

Once this image is gathered, the computer communicates with an onboard GPS that can find the precise location of the car, but only up to about a meter or two meters accuracy. This is paired with high definition maps that further the accuracy to about single-digit centimeter accuracy. The maps have landmarks (like buildings, light posts, etc) that when the camera and lidar vision recognize can be triangulated to a very specific location. 

Now that the car can see what obstacles are immediately around it, and it knows precisely where on earth it is, it can decide how to navigate to its particular destination. This is path planning, like your normal GPS maps a route for you to drive the car does this for itself however it creates waypoints or plots a projected path that the car should follow. Then the car controls the actual physical controls of the car to match as best it can the projected path.

It basically boils down to three main aspects we’ll call sensing, planning, and acting.

Tesla’s AutoPilot

Now for a more specific explanation of how Tesla Autopilot works because I think that’s probably what most people are familiar with. There are 8 cameras, 12 ultrasonic sensors, and a forward-facing radar. On the front, there are three cameras each respectively with wide, medium, and narrow fields of view.  This diagram shows the range that this car gathers from its sensors and cameras.

Levels of Automation

These components are the base for what companies like Tesla, Waymo, Apple, Polestar, Audi, and basically everyone else are using to create their self-driving cars. Although they all use these as a base they have varying levels of autonomy. The levels of self-driving cars are explained like this:

Level 0: No Automation

  • My ‘89 Nissan Pao falls into this category. Basically it’s just you and your brain, hands, and feet in control. 

Level 1: Driver Assistance

  • If your car can control itself to really any degree at all, like Lane Assist, or radar-based cruise control it’s probably a level 1. The main thing here is that it can control handling or braking, just not at the same time. 

Level 2: Partial Automation

  • Partial Automation means it can control two aspects at the same time e.g. braking and steering.  Technically speaking the driver should maintain full attention but I see you eating your bowl of cereal on the freeway.

Level 3: Conditional Automation

  • What we’re all dreaming of -- even as someone who likes driving. You’re required to be in the car but you don’t have to be aware of everything all the time. There isn’t really any consumer vehicles that are at this level yet. 

Level 4: High Automation

  • Hello Waymo, this is basically full automation with the caveat that the weather conditions have to be ideal. 

Level 5: Full Automation

  • No safety driver, this car is independent he’s moved out of your garage and he doesn’t need you to hold his hand when he crosses the street anymore. This is what companies like Nuro are aiming for. Imagine a world where you say “Hey Siri, get me to the Staples Center” and you’re seamlessly whisked away from your porch right to the sidewalk outside the game. Maybe you don’t even need to say “Hey Siri” by this point, maybe you just think it and it comes. Who knows.

article continues below Advertisement

In order for a self-driving car to work, many different systems have to work connected to each other simultaneously.

More Detailed Explanation

When you say “Hey Siri” to your phone (if you have an android, “Hey Google”) and the magic little ai assistant pops up on your screen, you can ask it “What is the capital of Nebraska?” and she’ll likely answer you with the correct answer. For that to work, there are a few technical components that have to play together. The first being voice recognition, your phone will always be listening for that phrase “Hey Siri”. When it recognizes that phrase, Siri wakes up and listens to what you then can ask or say. Your phone interprets the request and then accesses the relevant information. Whether you’re asking trivia or asking Siri to remind you, the function is essentially the same. It’s pretty simple. Here’s what it looks like (very much simplified):

A virtual assistant really serves only one purpose: information recall, whether from your phone, or the internet. The artificial intelligence required to drive a car is far more complicated. Think of all of the things you do when you drive a car. Let’s take just the speed of the car, for example, just the speed, nothing more. When you drive how do you manage the speed? 

First, you probably look for a sign that tells you the speed limit. If/when you find one, you read the sign -- let’s say it says 25mph -- you now know 25 is the fastest you can go but not necessarily how fast you should go. Then your brain moves your foot to the gas pedal, your brain then has to control the pressure applied to the pedal. Your eyes have to look at the surroundings, your brain takes that information then determines how fast the car should move. I could keep going but I think you get the point. Just to manage the speed requires a lot of different parts. 

So in order for a self-driving car to work, many different systems have to work connected to each other simultaneously. Like one brain as opposed to many, this is called an end-to-end solution. End-to-end solutions use deep learning to address multiple needs at once, as opposed to multiple smaller brains solving separate problems. 

There are not any successful end-to-end solutions currently in use for self-driving cars. This is one of the main reasons we haven’t reached full autonomy yet. Current solutions still use multiple systems that address the different needs of the car. The main categories (as discussed above) are sensing, planning, and acting.

Sensing the Surroundings

As discussed above, these cars rely on a myriad of sensory inputs. The car has to manage its typical systems that any other car would, oil levels, engine, transmission, etc. as well as everything needed for it to drive safely and successfully. The two categories of sensors are proprioceptive (position and movement) and exteroceptive (surroundings). 


Cameras are cheap and can be placed all around the car’s exterior to provide information regarding immediate surroundings and wayfinding objects (stop signs, stop lights, traffic signs). Stereo cameras on the car are able to act as the human eyeball, they can see the skateboarder crossing the street. These cameras use a technique called image segmentation. The computer uses deep learning to break down the image into parts that are identified as stop signs, lights, or the lane that you’re driving in.  Mono cameras’ main purpose is to detect signage, and lights they determine whether the light is green or red. One issue with cameras is they are easily blocked or obstructed. Because they’re vision-based, a really dark night could lower their effectiveness. 

Radar & Ultrasound

These sensors help the car to get a better understanding of distance using radio or sound waves. Radar is what has been used for years to gather distance information just like in submarines. These sensors are placed on the front, back, and sides of the car. They send out a radio (or sound) wave that tells distance depending on how fast that wave returns to the sensor.  The benefit of radar is that it works in many more weather conditions, has a wide-angle view, and has a relatively long range of sense. Ultrasonic has similar benefits but a much smaller range.


Lidar stands for Light Detection and Ranging. It works similarly to radar and ultrasound in that it sends out a signal and receives one back. Lidar sensors send a pulsing signal towards a surface and measure the time it takes for them to bounce back. Lidar imaging has been used in planes for years to create fairly clear images of the planes surrounding. On self-driving cars, the housing that contains the lidar sensors swivels around to create a 360-degree image of the cars surrounding. The frailty of the sensors allows them to easily be confused. Last year at Tesla’s Autonomy Day Elon Musk discounted lidar technology as “lidar is a fool’s errand”. Tesla’s leaders described lidar as a crutch to those who use it, and a false sign of progress.

Planning the Path

After receiving the information from the sensors the car has to interpret the information and then make a plan of action.

Understanding the Surroundings

For our brains to properly perceive something we have to combine information from multiple inputs, smell, taste, sight, and touch. Self-driving cars are the same, in order for the car to navigate on the road, it has to combine all of the data from the sensors into a singular full image. 

Typically the data is interpreted through an algorithm called a Kalman Filter. The Kalman filter takes sequences of data from all of the sensors and creates one sequence that estimates car’s (or objects) current position. When new data comes in from the sensors the algorithm goes back and updates past data points to paint a more accurate position of the car’s current position and future path. 

Planning the Path

Four different considerations are taken by the car to make an accurate plan:

  • Path: where are you going actually? This is the gps that you input where your final destination is and the car will decide the fastest way to get there. 

  • Predicted Scenarios: This includes standard aspects of driving like changing lanes, staying in the lane, stopping at a stop sign, etc. 

  • Unpredicted Scenarios: Here the car decides how it will react in an unplanned situation, say if someone slams on their brakes in front of you, or if the car needs to maneuver around something. It also has to consider what other cars will do during that time it’s maneuvering. This is where deep learning comes into play heavily. Engineers who are training these artificial intelligences will input thousands of potential scenarios into these cars so they can get better at reacting to unplanned scenarios. Tesla uses all of the data from its users cars to make each of them smarter. They just announced last year in fact that they’ve passed 3 billion miles driven using autopilot. That’s a lot of data. 

  • Trajectory: If you’ve ever driven a car you’ve noticed that it’s hard to stay exactly in the middle of the lane when making a sharp turn, this is because of inertia. You as the driver can anticipate this and adjust accordingly and so the car does as well. Not just with turns, but in all scenarios. 


The last step is perhaps the simplest. After gathering all the information, mapping a path, it’s time to move. Here the car actually takes control of the acceleration, braking, and steering and follows the path it set out for itself in the planning phase.