Author: Liu Hong
Introduction
Most people think that Tesla’s sensor fusion is the best, even though it looks down on LiDAR. However, on April 9th, Elon Musk, the CEO of Tesla, revealed on social media that the recent upgrades to its Autopilot and Full Self-Driving (FSD) Beta v9.0 are nearing completion. Tesla hopes to eventually make its system fully camera-based, a purely visual method. This means that Tesla’s electric vehicles will navigate and execute driver assistance functions without using components such as radar in the future.
Tesla’s FSD Beta v9.0 has been highly anticipated by the market, which was originally expected to be launched by the end of 2020, but so far the system is still in the testing phase. Musk’s reason is that delaying the release of FSD is to improve its system to provide users with confidence in using it.
Pure vision, no radar
Musk expressed the above views in response to a post by @WholeMarsBlog, a Tesla owner and FSD Beta user, who shared a segment of his Model 3 driving from the parking lot to the destination without any driver intervention. Musk revealed in his response that FSD Beta v9.0, which is highly anticipated by the electric vehicle industry, is almost ready.
FSD Beta v9.0 is about to be released. Incremental improvements are massive, especially for weird corner cases and bad weather. Pure vision, no radar.
- Elon Musk (@elonmusk) April 9, 2021
Musk further explained that the v9.0 update will improve the adaptability of the FSD beta version in extreme conditions and adverse weather conditions, while also improving the vehicle’s turning ability. He admits that these are still challenges for today’s advanced driver assistance systems. But that’s not all, as Musk points out that the update will be “purely visual, no radar.” Speaking of further development, Musk admits that Tesla eventually plans to get rid of radar completely in its future vehicles, even millimeter-wave radar!As expected, Musk’s comments about Tesla getting rid of radar have drawn criticism from a significant number of people, many of whom believe that a pure vision approach is a step backward. This is especially noteworthy because compared to General Motors’ Cruise and competitors such as Waymo, NIO, and XPeng, Tesla already uses very few sensors in its driving assistance system, and these two companies rely on some LiDAR and high-definition maps for navigation.
Musk responded to these concerns by emphasizing that vision may ultimately be far superior to radar. He stated, “When vision and radar disagree, which one do you believe? Vision has better precision, so doubling down on vision is better than trying to combine with sensor fusion.”
Sensors are a stream of bits, and the bitrate/second of cameras is several orders of magnitude higher than that of radar (or LiDAR).
Radar must have purposeful destination to increase the signal-to-noise ratio, encounter integrated complexity, and be left far behind by vision as its processing capabilities improve.
- Elon Musk (@elonmusk) April 10, 2021
Vision is “Very Likely” Useful
Tesla’s method of achieving full self-driving is partly based on the idea that human driving is done 100% through vision without using any radar or LiDAR. Since Tesla’s Autonomy Day in 2019, the company’s executives have emphasized this point and unveiled a customized FSD computer. As for whether cameras can provide the same level of safety as radar to detect multiple vehicles ahead, Musk pointed out that vision is also very likely to be able to do the job.
“It’s best to think of these things in terms of probability. There are 5 forward-facing cameras. At least one of them is very likely to see several vehicles ahead,” Musk said.
Tesla Isn’t AloneIn fact, Tesla is not the only company seeking unique visual methods on the market. In May 2020, Intel released a video showing a MobileEye autonomous vehicle driving for about 20 minutes on the roads of Jerusalem, equipped only with a single set of cameras and nothing else. This brief video left a deep impression, showing the MobileEye autonomous vehicle passing through four intersections without traffic lights, requiring the vehicle to avoid pedestrians and other vehicles on city lanes.
What guarantees the safety of autonomous driving?
There are some statements of “possible” and “probable” in this article. Can we bet and play with driving safety like this? While the use of purely visual methods may seem to be a step backward, should we still consider issues of computing power and redundancy? Even if 100% of human driving is accomplished through vision, without the use of any radar or lidar, this does not mean that we should not replace human beings or use capabilities beyond human reach.
Dr. Guo Jishun, an expert in intelligent driving, commented on the purely visual approach of MobileEye,: “Although the camera’s visual method can pass the vehicle regulations, it requires a higher demand for visual algorithms, and the development difficulty is high, and the possibility of failure is higher. Therefore, a good L4 autonomous driving technology sensing solution that considers pre-installed factories should preferably consider using a vehicle-level multi-sensor fusion sensing suite (solid-state or hybrid solid-state lidar + vision + millimeter wave, etc.).”
He believes that under the support of a multi-sensor fusion sensing suite, L4 autonomous driving needs to have more intelligent perception and cognitive abilities. However, the current perception algorithms are more likely to achieve “target recognition,” which means that they can only obtain shallow cognitive attributes such as object categories, positions, speeds, sizes, and so on in the perceived scene.
For more abstract semantic information related to targets and the scene events that multiple targets may cause, cognitive abilities are limited. For example, recognizing the hand gestures of Traffic police, or children crossing the road. To achieve true “cognitive intelligence,” more social common sense and traffic rule databases must be established.
If it’s purely visual, why not use binocular vision?Nowadays, some luxury cars are equipped with binocular cameras, but as a “luxury car” in many people’s minds, Tesla has not yet mass-produced vehicles with binocular cameras. Monocular cameras need to recognize targets to obtain distance information, while binocular cameras not only can accurately measure distance but also can identify brake lights, lane lines, roadside traffic signs, etc. However, binocular cameras require ultra-high computing power to perform stereo matching on each pixel point. Tesla has its own self-developed chip and should not lose to others in terms of computing power.
According to Sun Lu, the director of vision products at Bao Long Technology, “The problem with monocular cameras is that exhaustive method cannot cover all scenarios, and there are still some risks of manipulation errors. Binocular cameras have a certain technical threshold and are not easy to achieve high-performance indicators. The industry has no specialized chips and generally uses FPGA, which has high process difficulty. In addition, high structural accuracy, durability, consistency, and temperature adaptability are required. Large investment is needed, including automatic calibration (AA) algorithm, static calibration algorithm, and internal parameters saving.”
Fans of millimeter-wave radar and lidar believe that current mainstream cameras can only provide 2D image information and lack depth. The predominant difficulty in using cameras as the main sensor is depth recovery. The path planning for autonomous driving requires 3D road information and 3D obstacle information. If cameras want to become the main sensor, they must be able to provide accurate depth perception.
From the information publicly disclosed by Tesla, its depth recovery is quite good, providing a solid foundation for perception, positioning, and planning. However, this must be supported by a well-trained system. Although it has massive data that can be used to train depth models, it still cannot guarantee that it can correctly handle all scenarios, which is what Musk called “strange extreme cases”. Therefore, once the depth prediction is inaccurate and “outliers” (such as color recognition) appear in the training, the position of the road environment and obstacles may be wrongly estimated, which may lead to car accidents and fatalities.
Should we expect Tesla to be equipped with binocular (not to expect trinocular due to high cost) cameras without lidar or other radar? Could this provide additional safety assurances for drivers and comfort investors? We can only wait and see.
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.