Why can't Tesla achieve L5 autonomous driving?

In the past decade, autonomous driving has evolved from “completely impossible” to “definitely achievable” for most people. Countless young people like myself, who are full of belief in autonomous driving, have invested in this field. However, there are still confusions about how to achieve fully autonomous driving, how to make profits, and when it will eventually become popular.

Therefore, in this article, I will analyze these issues from technical, social acceptance, and commercial application perspectives, based on Musk’s 2020 autonomous driving vision.

Musk’s 2020 Autonomous Driving Vision

Elon Musk, CEO of Tesla, said in a video released at the Shanghai World Artificial Intelligence Conference last month, “I think we are very close to L5 autonomous driving. I am confident that we will develop the basic functions of L5 autonomous driving this year.”

This statement sparked industry discussions, and many media outlets took his statement out of context and partially exaggerated it, claiming that Tesla could achieve “L5 fully autonomous driving” in 2020.

Let’s first discuss the “basic functions of L5 autonomous driving”.

I personally believe that in restricted scenarios, the basic functions of L5 autonomous driving can be developed quickly. Any top car manufacturer has the capability to produce an L5 autonomous driving car without a steering wheel and test it on a dedicated test field this year, but that’s it.

If Musk is only discussing the “basic functions of L5 autonomous driving,” then the news is not attractive to me, because Sedric, which incorporates L4/L5 autonomous driving technology, was developed and tested in Hamburg earlier this year.

“ENIAC,” the world’s first general-purpose computer, was born in 1946, but it was not until the 1980s that personal computers were popularized. The key to the evolution of a technology from research to product, and then to popular commodity, lies in cost and experience. Currently available L5 autonomous driving sample cars do not meet these two requirements.

According to SAE’s classification of autonomous driving levels, L5 autonomous driving cars require the car to be able to drive in any situation without human intervention, and the human occupants are just passengers who do not need to participate in the driving. Therefore, fully autonomous driving cars do not even require a steering wheel or a driver’s seat, and passengers can spend more time on more productive work while in the car.”We are very close to L5 autonomous driving,” this statement is not incorrect. It is highly probable that Tesla will be able to complete the basic functions of L5 autonomous driving by the end of this year. However, does “basic” mean “complete and deployable”? Will governments and regulatory agencies allow them to operate on roads? These are questions that Musk has not explained, leaving room for significant interpretation.

In fact, in many engineering problems, especially in the field of artificial intelligence, the last mile often takes a long time to solve. “Close” is never equal to “equals”. The ability to run on a test track does not necessarily mean the ability to handle real-world roads.

More importantly, the popularization of autonomous driving is not just a matter of technology, but requires the whole society to be willing to accept the changes brought about by this technology.

The current autonomous driving technology on the market, including Tesla, is at the L2 level, which means “partial automation.” The driver must always maintain control of the car and place their hands on the steering wheel when the autonomous driving function is activated.

In fact, Tesla is gradually deploying the functions of L3 autonomous driving on its externally announced L2 autonomous driving platform, such as traffic light detection and full self-driving capability for destination navigation, allowing consumers to experience features beyond the L2 level of other manufacturers but without assuming the legal risks and ethical dilemmas of L3.

After all, for consumers, they do not care what level it is; they only care if the function is novel and reliable. If the price is also within their affordable range, it can definitely be considered.

The purpose of many of Musk’s remarks is to make people believe in Tesla’s technological strength and the Full Self-Driving Capability of Tesla’s pure vision solution, in order to make more ordinary people willing to pay for Tesla. Musk is a great scientist, but this absolutely does not hinder him from becoming a cunning businessman.

Now let’s discuss what our engineers are interested in: “Can Tesla’s pure vision deep learning solution achieve L5 full autonomous driving in 2020 or even in the next few years?”

My answer is: No.

The following analysis will be based on Tesla’s technology roadmap, the social acceptance of L5 autonomous driving, and the commercial model, explaining “Why do I think Musk’s 2020 L5 autonomous driving vision cannot be achieved?”

Tesla’s Technology Roadmap

Musk has also stated in previous speeches that he believes Tesla cars can achieve L5 autonomous driving “solely through software improvements.”

Currently, L4 autonomous driving companies, such as Waymo and Uber, use LiDAR modules to create highly accurate 3D maps of the environment surrounding the car, in order to supplement the lack of pure visual perception and provide more safety redundancy.Tesla currently relies mainly on camera-based pure visual algorithms to recognize traffic scenes, detecting roads, cars, objects, and people from video sources installed around the vehicle using deep neural networks from its eight cameras. While Tesla is also equipped with front radar and ultrasonic radar for assistance, the effectiveness is limited.

Musk’s logic is that humans mostly rely on their own vision to recognize surrounding objects, and what the human eye can do, the camera can do the same.

This logic is not complete, because first of all, the back of the human eye is connected to 3D mapping hardware connected to the brain to detect objects and avoid collisions, which is lacking for cameras at least in 2020.

Secondly, the current deep neural network is at best a rough imitation of the human visual system, simulating only a small part of the nerve cells of the human brain cortex.

The limitation of deep learning is that it requires a large amount of training data to work reliably. When facing new situations not included in the training data, they lack the creativity and flexibility of humans.

Therefore, relying on existing deep neural networks, we cannot achieve the predictability of everything in “Westworld” as Rehoboam did.

Finally, the human eye has evolved over millions of years, and the visual cortex is very sensitive to specific things such as object shapes, specific colors, textures, and motion tracking. Our traffic facilities and buildings such as cars, roads, sidewalks, road signs, and traffic lights are designed based on the preferences and sensitivity of the human visual system.

We consciously or unconsciously chose the colors, textures, and shapes of these objects based on the overall preference and sensitivity of the human visual system.

Artificial intelligence often learns to understand some of these characteristics on an acquired basis, while humans have long possessed this excellent recognition ability on an innate basis.

Perhaps one day the camera can also achieve the same effect as the human eye, but this day is certainly not today or this year.

Regarding the data dependency of deep learning, Musk also mentioned in his speech that the reason why Tesla Autopilot cannot be used as effectively in China as it is in the United States is because most of Tesla’s vision algorithm training data comes from the United States. This is actually the same reason why the Audi A8’s L3 autonomous driving function is heavily limited in its application in China.

This leads to a concept: the long-tail problem of deep learning.

The so-called long-tail problem of deep learning refers to the unknown number of the extreme situation or “corner case” that the model faces, which is likely to be infinite.Human drivers can quickly adapt to new environments and conditions, such as new cities or towns, or weather conditions they have never experienced before (such as snow, heavy fog, muddy roads, etc.).

When we encounter new situations, we use intuitive physics, common sense, and knowledge about how the world works to make rational decisions. We understand causality and can identify which events lead to others. We also understand the goals and intentions of other rational agents in the environment and can reliably predict their next actions. However, current deep learning algorithms do not have this functionality and require pre-training for each possible situation.

Even Tesla’s deep learning algorithms that adapt to highway environments are not good at handling situations outside of their training scenarios, such as the Tesla crash that overturned the vehicle in front of it in Taiwan two months ago. Tesla can continually update its deep learning models to address so-called “corner cases”, but in reality, such extreme situations are unknown.

Musk stated in his speech that “I don’t think there are fundamental challenges to achieving level 5 autonomy, but there are many incidental challenges. The challenge for us is to solve all these small problems and then integrate the system, continually solving these long-tail problems. You’ll find that you can handle the vast majority of scenes, but every now and then there are weird, unusual scenes, so you have to have a system to identify and solve those weird and unusual scene problems”. This is why you need real-world scenes. Nothing is more complex than the real world. Any simulation we create is a subset of the complexity of the real world.”

He believes that the confidence that level 5 autonomous driving has no fundamental challenges comes from Tesla’s ability to collect real-world data from around the world to solve autonomous driving problems. “By continuously simulating virtual scenes and fine-tuning its algorithms, Tesla will be the first to approach the limits of deep learning long-tail problems in real-world scenarios.”

There are currently two ways to solve long-tail problems. One is to train larger and more complex neural networks on large data sets, eventually achieving human-level performance on cognitive tasks. The other is to directly fit on large data sets, finding correct data distributions that cover a larger problem space.

Both of these methods are based on big data. If these theories are correct, then Tesla can indeed achieve level 5 fully autonomous driving in the foreseeable future by collecting and effectively using an increasing amount of car data.

However, some argue that existing deep learning theories are fundamentally flawed because they can only interpolate. Human perceptions of the world involve not only environmental information extraction but also logical causality.Without exploring causal models of the environment, deep learning cannot accurately train on the subtle differences required to solve different problems. No matter how much data a deep learning algorithm is trained on, it cannot be fully trusted due to the occurrence of novel situations that can cause reasoning failures.

In contrast, the human brain does not require explicit training, but extracts high-level rules, symbols, and abstract concepts from each environment and uses them to infer new settings and situations.

Therefore, deep learning algorithms will never reach the level of human driving ability if they do not leverage abstract networks and symbolic operations, incorporate common sense, causal relationships and intuitive physics.

Of course, there are many improvements to deep learning networks, such as:

  • Hybrid artificial intelligence, which combines neural networks and symbolic artificial intelligence to give deep learning the ability to process abstract information;

  • System 2 deep learning, which provides symbolic processing capabilities to deep learning using a pure neural network approach;

  • Self-supervised learning, which learns by exploring the world without the need for lots of help and guidance from humans;

  • Capsule networks, which create a quasi-three-dimensional representation of the world by observing pixels and establish connections between different components of objects.

These are the cutting-edge early research in the field of deep learning, but they are not yet ready to be deployed in automatic driving cars and other artificial intelligence applications.

In summary, Tesla’s pure vision-based deep learning solution cannot achieve L5 fully automated driving in 2020 or even in the next few years.

Next, let’s discuss whether “society is ready to accept L5 fully automated driving”.

Social Acceptance of L5 Automated Driving

Standard Regulations

In Tesla’s current L2 automated driving system, drivers are required to take responsibility for all their own driving behavior.

For accidents caused by human-driven cars, we have clear rules and regulations. However, self-driving cars are still in a gray area, even accidents caused by L3 automated driving systems do not have detailed legal provisions issued by any country.

For L5 automated driving cars, drivers do not need to take responsibility for accidents. What are the responsibilities of manufacturers and providers of self-driving cars? What kind of responsibility should insurance companies take? Will Tesla, which has been unwilling to take responsibility for Autopilot accidents, be willing to take on this possible responsibility?### Human Self-Direction

Advocates of autonomous driving AI often believe that human drivers make many mistakes, while the probability of errors in autonomous vehicles is much lower than that of humans, making driving safer.

I am skeptical about this. Firstly, due to factors such as fatigue, carelessness, and drinking, humans do make many mistakes while driving, but the current sample size and data distribution cannot accurately compare the accident frequency between drivers and AI.

Secondly, compared with the AI algorithms behind autonomous vehicles, although humans make frequent mistakes, they rarely exhibit bizarre behaviors. For example, rational human drivers would not drive into a toppled truck.

Finally, humans often try to understand and accept the consequences of their subjective actions, but cannot accept having their lives controlled by others and passively taking on all fatal consequences. People expect the safety of autonomous driving to far exceed themselves.

This leads to the next question: safety.

Safety and Public Trust

In his speech, Musk raised a question: “What is the acceptable level for safety of L5 autonomous driving? Double, triple, five times or ten times what it takes to reach human levels?

There is no unified standard for safety evaluation, and at least having safety comparable to humans is not enough. Only exceeding expectations has value, and regulatory authorities will not consider L5 autonomous driving achieving safety comparable to human drivers to be sufficient.

However, according to the current situation, the safety of deep learning algorithms is far from that of ordinary people.

Human reasoning not only involves extracting and analyzing information, but also reflects human thinking. The combination of these factors enables humans to make generally correct judgments.

In any case, people tend to blame technology rather than humans themselves, which leads to a lack of trust in technology. The lack of public trust will affect the entire automotive industry. In the future, there will inevitably be experiments similar to the Turing test to evaluate the safety of autonomous driving AI.

So to achieve safety similar to human reasoning, in addition to the improvement of AI vision algorithms, we can also increase constraints to ensure that AI algorithms and hardware work normally and reliably. Adding railway tracks to the AI algorithm should make the probability of derailment small enough, right?

This constraint is: Vehicle-to-Everything communication (V2X).

V2X

We can change roads and infrastructure to accommodate the hardware and software present in cars. For example, we can embed intelligent sensors in roads, dividing lines, cars, road signs, bridges, buildings, and objects.

This will allow all these objects to recognize each other and communicate via radio signals, known as V2X.image

Computer vision will still play an important role in autonomous driving but will be complemented by all other intelligent technologies existing in cars and their environments.

With the expansion of 5G networks and the decreasing prices of smart sensors and internet connectivity, V2X-based self-driving solutions will become more common.

However, the cost of modernizing road infrastructure has not been incorporated into the predictions for the majority of autonomous driving development, and the operation of L5 autonomous driving, which runs extensively over regions, may require substantial infrastructure investments to be reliably operated throughout the entire area.

Technological and supporting facility changes often require continuous huge investments and time input from companies and governments. But for local policymakers, the construction of autonomous driving supporting infrastructure needs to consider many factors.

Firstly, although local policies will play a core role in infrastructure spending and construction, different economic, political, cultural, geographical, and weather conditions across the country will affect the speed and quality of construction.

Secondly, after obtaining the advantage of technological first-movers, local governments will consider more about the effective use of facilities, the return on investment, the balance of interests between different jurisdictions, policy incentives, talent training, and labor employment.

Finally, the privacy and security threats brought by the Internet of things will also be one of the factors affecting the modernization of road infrastructure.

Geofencing

Geofencing is a key to mass-produced autonomous driving/advanced driver assistance technologies and is also one of the main development tasks when deploying L2+ autonomous driving for future automakers.

Geofencing means that the autonomous driving technology only operates in areas where the functionality has been adequately tested and recognized with intelligent infrastructure and regulations tailored for autonomous driving.

That is to say, it limits which roads and areas can enable the vehicle’s autonomous driving function, and which roads are default closed to this feature.

The setting of geofencing needs to consider the complexity differences between urban and highway driving, regional differences in infrastructure and driver behavior, and cases such as poor visibility or abnormal traffic facilities on certain road sections.

Geofencing is actually a technical transition measure, mainly considering the current state of deep learning, and the prospect of launching fully automatic driving technology overnight is not optimistic. Therefore, the automakers’ main goal is to maximize the balance between the experience and safety of autonomous driving.

As technology progresses, infrastructure develops and regulations adapt, these restrictions will gradually decrease, so that the existing advanced driver assistance will smoothly and gradually transition to fully automatic driving.

Although designing geofencing content is boring, I highly recommend that autonomous driving companies have their geofencing database as this will greatly facilitate the rapid deployment of their technology solutions among different customers and regions.The change brought by new things often takes a long time for society to adapt to. Apart from the evolution of the technology itself, there are still long roads to travel in terms of standards, regulations, ethics, and public trust.

Achieving L5 autonomous driving not only means achieving it technically, but also means that the product can be sold and used by customers.

Therefore, let’s discuss whether L4/L5 autonomous driving has a mature business model.

The Business Model of L5 Autonomous Driving

Just as our roads developed with the transition from horse-drawn carriages to cars, with the emergence of software-driven and autonomous driving cars, urban transportation may undergo more technological changes.

It can be predicted that autonomous driving technology will disrupt urban transportation for a long time and lead to its deep transformation, but this transformation will not occur suddenly. At least for the next decade, fully autonomous driving will be limited to specific geographic and climatic regions.

Along with the development of automobile electrification, the Internet of Things, and cross-model service models, more and more automated travel systems will flourish in the next few decades.

This has created a new model for goods and service distribution, the physical Internet.

The market size of autonomous driving in the next decade is expected to be in the billions of dollars, but the transformation of urban transportation and labor upgrading that comes with it will bring trillions of dollars to the market.

The profitability of autonomous driving is not just about selling vehicle technology solutions, but also about participating in and providing diversified services in the “physical Internet”. Even providing hardware foundations and software platforms can generate revenue that matches or exceeds the sale of technology solutions themselves.

In the MIT Future of Autonomous Driving report, researchers divided the future business path of autonomous driving into four categories:

Autonomous Taxi Fleet

Waymo, Uber, and DIDI have all formed their own autonomous driving fleets. Although this field has great commercial value and star effect, it will be limited to specific regions or road sections in the visible future, such as from the city to the airport. Driver supervision is still essential.

Automated Loading and Unloading Vehicles and Buses

Currently, the driving of autonomous vehicles still needs to be restricted to strict geographic areas, so fixed-route public transportation is actually better suited to meet this requirement.

We can reshape the road facilities of fixed routes to increase the geographic limitations, making it easier for automated loading and unloading vehicles and buses to handle driving scenarios along the line.For governments, autonomous buses have predictable environmental costs and benefits, fixed pedestrian travel patterns, higher public transportation utilization rates, and can effectively improve traffic congestion by covering heavy traffic-uncovered areas in advance, while also increasing basic employment opportunities and enhancing urban image. Therefore, autonomous driving commercial models will be the government’s first choice for deduction and support.

Autonomous Long-haul Trucks

Autonomous long-haul trucks also have predictable environmental costs and benefits and fixed traffic scenarios (highways), which may become the first commercial use of autonomous driving technology.

Through remote monitors, various combined numbers of vehicles and individuals (multiple autonomous trucks following the driver’s control of the lead truck) and providing drivers with sleep time during the journey, truck companies can reduce the demand for drivers on the truck route and shorten transportation time.

Therefore, autonomous trucks have strong economic appeal to customer companies and also great profit margins for autonomous driving companies.

Driver-assisted Personal Cars

In the next ten years, more active safety features will appear, and advanced driver assistance systems will continue to be the main commercial application of personal vehicle autonomous driving.

L2/L3 autonomous driving cars are gradually popularized, but L4/L5 personal car autonomous driving may have to be popularized after the first few models are realized.

The reason is that the L4/L5 driving area of personal vehicles should not be widely restricted, and the hardware cost of L4/L5 vehicles is bound to be high. Personal vehicles also have the highest safety requirements. If they cannot provide a wide range of autonomous driving scenarios for drivers, then this function is not cost-effective for ordinary consumers, and the popularity of this technology in the personal car field will be a long-term goal.

In addition to the above vehicle commercial models, the construction of autonomous driving technology supporting infrastructure and new transportation services extended from autonomous driving are also the future commercial profit points in this field.

Even for a long time in the future, making money from autonomous driving infrastructure may be more profitable than selling autonomous driving technology solutions.

In the above autonomous driving commercial models, L4 autonomous driving trucks may be the first to become popular around 2030, while L5 fully autonomous driving, especially personal cars, is located at the intersection of many scientific, legal, social, and philosophical fields. It requires society as a whole to prepare and change for it. It may be realized around 2050 or may never be realized because the core of L5 is that all things on the road are predictable, while humans are always unruly creatures.

Finally, what I want to say is that L5 autonomous driving is actually an extension of L4 scenarios, and L4/L5 autonomous driving engineering is generally discussed together.

If 99% of the usage scenarios can achieve autonomous driving, then its value is actually sufficient. There is no need to pay a cost far exceeding 1% to pursue the last 1%, and even more, there is no need to develop autonomous driving to pursue standards.The purpose of autonomous driving is always to make driving safer and smarter!

In conclusion, I don’t think Tesla is prepared, technologically, socially or commercially, to achieve L5 autonomous driving in 2020 or even in the coming years.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.