Analysis of Automatic Annotation: Tesla Trains Autopilot with Human-Driving Behavior

Summary

  • Autonomous driving behavior provides Tesla with computer vision automatic labels.
  • Automatic labeling provides Tesla with a competitive advantage over competitors such as Waymo and Cruise by leveraging its massive fleet mileage.
  • Tesla can also use automatic labels to predict road user behavior and perform driving operations.
  • A certain degree of autonomous driving should not be overlooked, as it brings higher revenue and gross profit margins to Tesla.
  • Autonomous driving software makes the futuristic design style of Cybertruck more natural at the time of release.

According to foreign media reports, labeling will help the operation of machine learning. A single label can explain the correct output value of given input information to an artificial neural network.

For example, pixels of idle lanes in a video will be labeled as “free space/available space”. Pixels corresponding to vehicles, pedestrians, sidewalks, traffic cones, obstacles, etc. cannot be marked as “free space/available space”. Given a sufficient number of labeled examples, the neural network is very good at learning the types of pixel graphics corresponding to the road and obstacles.

In a new video shown, the trained artificial neural network can give the correct output results, enabling the autonomous driving vehicle to understand where safe autonomous driving can be achieved and where it cannot. The following video shows Tesla’s understanding of “free space/available space”, represented by green color:

The default way to implement video annotation is to hire relevant personnel to manually annotate the video pixel content. However, this method is too expensive because manual labeling is very time-consuming, and the amount of required labeled data is enormous. If there are other ways to complete data labeling, what would we do?

General executive outlines the concept of automatic labeling

As it turned out, we have found an alternative solution. Kyle Vogt, president and chief technology officer of Cruise, General Motors’ self-driving car subsidiary, recently outlined his basic concept:

What we are doing now is more about automatic labeling. What I mean is basically to eliminate human labeling from the work loop.What really interests me is that we can infer a lot from the way a vehicle is driven. If a vehicle is driven without any mistakes, we can infer the correct (operation) items implied in the vehicle operation. When automated vehicles can maintain the correctness of driving operations, passengers will say “you are doing a great job!” For me, this means that vehicles need to acquire a very rich source of information.

Vogt continued: “If you are a company and your business model still depends on manual data annotation, then your company will be crushed by some competitors because they are thinking about how to use new ways to reorganize data annotation, which ensures that they do not have to spend too much energy on such data tags or directly remove the manual annotation link from the work circuit.”

So, can we use manual driving habits to complete automatic labeling of “free space/available space”? In a paper published in 2018, computer vision researchers (including two researchers from Indiana University) made exploratory attempts.

Human drivers rarely hit obstacles, and they almost always drive in idle lanes. Therefore, manual driving can be used as an automatic tagging approach for “free space/available space”. The researchers combined this automatic tagging with the estimated “free space/available space” visible in the line of sight. In addition, the accuracy of automatic tagging was 98% that of manual tagging.

Tesla’s advantage in the field of automatic tagging

Since the beginning of this year, Tesla has been emphasizing the importance of automatic tagging. In an interview in February this year, Elon Musk said: “We started using automatic tagging, which is really better than manual tagging. Personally, when the driver drives the vehicle to the intersection, this operational behavior trains the Autopilot and informs it of the operation required when encountering the intersection.”

In the “Autonomy Day” in April this year, Andrej Karpathy, senior director of Tesla’s artificial intelligence department, repeatedly discussed how Tesla uses automatic tagging. The following video is an example:

Let’s review the job description released by the Tesla Autopilot team in February this year. At that time, Tesla was looking for applicants who could design new methods to use lightly labeled massive annotated data.The automatic labeling of “free space/available space” seems to be in line with Tesla’s usual practice. Compared to the researchers mentioned earlier, Tesla has access to more abundant data sources. For example, Tesla can detect other signs of emergency braking and vehicle collisions. In theory, this method may also be able to clear examples where Tesla vehicles have not entered “free space/available space”.

Tesla’s data is not only abundant, but also very sufficient.

By the end of this year, Tesla will have nearly 700,000 Tesla vehicles equipped with surround-view cameras driving on the road, and these vehicles are equipped with second-generation or third-generation onboard computers.

The average monthly mileage per vehicle of these vehicles exceeds 1,000 miles/vehicle (approximately 1,609.3 kilometers/vehicle), and the total monthly mileage of this type of vehicle exceeds 700 million miles (approximately 1.127 billion kilometers/month).

In contrast, Waymo has only 600 autonomous vehicles, which is the world’s largest self-driving test fleet. In the United States, after all companies’ self-driving test vehicles are combined, the total number of vehicles is less than 1,400. Even if these 1,400 autonomous test vehicles continuously operate at an average speed of 70 miles/hour (about 112.65 kilometers/hour), the total monthly test mileage is only about 70 million miles (about 1.127 billion kilometers).

Skeptics pointed out that if Tesla manually labels the video data collected by its fleet, refined to “per mile” level, manual labeling is not feasible from an economic perspective. However, automatic labeling is expected to complete a larger quantity of data labeling work.

Baidu’s research has given us a rough understanding of the correlation between the accuracy level of neuron networks and the labeled training data. When classifying multiple target objects in an image, when the number of labeled training examples corresponding to each instruction is multiplied, the accuracy roughly doubles. Therefore, when the data volume mentioned above increases ten times, the accuracy doubles. When the quantity increases 100 times, the accuracy quadruples, and when the data volume increases by a thousand (10 to the power of 3) times, the accuracy increases by eight (2 to the power of 3) times, and so on. (When the data volume increases by 10 to the power of X times, the accuracy increases by 2 to the power of X times).

The growth rate of recognizing “free space/available space” may be even higher.The recognition of “free space/available space” belongs to binary, where each pixel either corresponds to “free space/available space” or “non-free space/non-available space”. In contrast, Baidu is more focused on how to choose the correct target object category from thousands of possibilities. The precision standard for this type of test is more tolerant, and if it is correctly classified (labeled) into the five major types of guesses, the neural network will consider it as the correct classification. Using this type of method, for every four-fold increase in training accuracy, there is a doubling of precision. For example, if the data volume is quadrupled, the precision is doubled. If the data volume is increased by 16 times, the precision is quadrupled. If the data volume is increased by 64 times, the precision is doubled eightfold, and so on.

In addition to recognizing “free space/available space”, Tesla may also use automatic labeling for other computer vision tasks. So, what happens if manual driving behavior is used to assist in labeling traffic lights?

When the driver is driving, the traffic light is usually green. When the driver stops, the traffic light is usually red or yellow. Of course, there may be some “noise” in these labels, as drivers occasionally drive through red lights. However, researchers have shown that even if there are such “noise” labels, this measure can greatly improve the precision of operations. Automatic labeling does not need to exclude manual labeling. On the contrary, automatic labeling can be used as an auxiliary method for manual labeling.

Clearly, manual labeling is also an important part of Tesla’s machine learning process. Andrej Karpathy explained this at the “Autonomy Day” event. When discussing manual classification labeling of videos or images, the size of Tesla’s vehicles is a great advantage.

Keller Vogt said, “We need a massive amount of data and driving tests because we are trying to maximize the entropy value of the data set we currently have and ensure its diversity. Essentially, entropy value represents the unpredictability or novelty of the data.”

A group of researchers has designed a new method to discover new target object categories from unmarked raw video. This technology or similar technologies are expected to help Tesla extract a large number of rare edge cases from the fleet’s mileage. Even if images or videos are manually labeled, Tesla’s data set has higher entropy value and diversity.

In addition to the computer vision field, Tesla can also apply automatic labeling to other areas. When it comes to predicting the behavior of road users, automatic labeling technology is easy to think of. The future will label the past. Tesla can use massive driving mileage to improve prediction accuracy without any manual labeling required.When it comes to imitating human driving, there are also abundant automatic tags. By completing the tagging process automatically through their driving operations, drivers enable the computer vision system installed in the vehicle to recognize these tags. This is known as imitation learning. Tesla uses fleet learning to study and mimic human driving behavior.

Due to the significantly higher mileage of its fleet than its competitors, Tesla is able to use automatic annotation technology to improve computer vision, pedestrian prediction, and driving habits (commonly known as “planning”) performance. Based on the research results of the Chinese search engine Baidu, Tesla’s adoption of automatic annotation for machine learning tasks may improve the scale and accuracy of its data, exceeding that of other competitors.

I believe the prospect of fully autonomous vehicles is still somewhat uncertain.

In June of this year, an internal report from Cruise was leaked to the media. The report covered Cruise’s expectations: by the end of 2019, the safety of its self-driving vehicles can only achieve a level of 5%-11% compared to human driving safety. From one perspective, this is disappointing. From another perspective, it is encouraging news. If Cruise reaches its expected target by the end of this year, it means that it only needs to further increase its safety level by 10 or 20 times to reach the average level of human driving.

In this article, companies like Tesla are capable of utilizing automatic annotation and large-scale fleet learning, which may significantly improve the scale of data related to autonomous driving and machine learning tasks.

Fully Autonomous Driving Suite and Cybertruck Bring Opportunities to Tesla

The financial potential of fully autonomous vehicles is enormous.

In terms of software costs (such as almost zero marginal costs), consumer-oriented vehicles will be transformed into profitable self-driving taxis. Analysts at McKinsey predict that in Los Angeles alone, self-driving taxis will generate annual revenue of up to $20 billion. In addition, McKinsey predicts that the annual revenue of the Chinese market may reach up to $2 trillion by combining self-driving taxis and fully autonomous private vehicles.However, some degree of autonomous driving should not be ignored. Combining machine learning with human supervision and intervention can also provide safe and enjoyable driving services for users. Today, the autonomous driving market is fiercely competitive, and humans and computers sometimes need to work together, like in the “cyborg chess” game played between humans and robots. In the near future, we may see “cyborg driving”, a way of combining humans and machines that can take advantage of both artificial neural networks and organisms (humans).

From a practical financial perspective, this means that Tesla’s Full Self-Driving Capability software option has a higher take rate, thanks to the revenue generated by this option, which may be related to its high price and the increasing demand for Tesla vehicles. In summary, this option can bring higher revenue and gross margin to Tesla.

In addition, the futuristic design of the Tesla Cybertruck has also sparked controversy, with public opinion divided.

The Cybertruck’s style is similar to that of Blade Runner, and it is expected that the first batch of Cybertrucks will not be delivered until the end of 2021. By then, I believe there may be a surge in advanced urban driving functions, which are sure to have a futuristic look, just like its appearance. The Cybertruck makes the combination of humans and machines more meaningful for driving, but this depends on the popularity of the Cybertruck. In my personal estimation, Tesla may launch Cybercar and CyberSUV in the future.

Before we enjoy autonomous taxi services or become “cyborg drivers”, Tesla still has a lot of manual design and development work to do, which takes time. Not all of Tesla’s development processes can be automated, which is difficult to predict.

Today, all we can do is wait and watch for the software update packages and new features that Tesla releases to its fleet.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.