Author: Su Qingtao

When it comes to the data of autonomous driving, in addition to the data scale, “data quality” is what we talk about most, and “high-quality data” often refers to extreme working condition data. How to efficiently filter out extreme working condition data from a large amount of scene data and only transmit this part of the data back to the back-end has become one of the most critical indicators for measuring the data processing capability of an autonomous driving company.

Among various known methods for filtering extreme working condition data, the shadow mode proposed by Tesla is undoubtedly the most influential.

The shadow mode has been regarded as one of the key weapons for companies that follow the “progressive” route to fully utilize their data advantages. Many domestic automakers and autonomous driving companies are also talking about the shadow mode.

However, in the past year or so, after communicating with some autonomous driving managers of automakers and CTOs of autonomous driving companies, the author found that the seemingly impressive shadow mode may have been “mythologized”, and there will be many problems in actual operation—

  1. The definition of the shadow mode is not clear, or most people’s understanding of the concept of the shadow mode is incomplete.

According to the conventional understanding, the core of the “shadow mode” is that under the state of human driving, the system, including sensors, still runs but does not participate in vehicle control, but only verifies the decision-making algorithm— the system’s algorithm continuously simulates decision-making under the “shadow mode” and compares the decision with the driver’s behavior. Once they are inconsistent, the scene is judged as an “extreme working condition”, triggering data transmission.

However, Dr. Cui Dixiao, Chief Scientist of IMa Technology, said that the shadow mode belongs to a part of Tesla’s data closed loop and does not directly correspond to data filtering. In addition to using trajectory differences on the control side to filter data, another application scenario of the shadow mode is to verify whether new functions can work properly or whether they have side effects.

Zhang Hongbin, Co-founder of CheRui Intelligent, explained that this is only a small part of the shadow mode.

Zhang believes that in principle, the neural network structure behind the autonomous driving research and development systems that are expected to be commercialized in a short period of time is not an end-to-end system, but four main modules of perception, prediction, planning, and control, each of which mainly uses NN networks for implementation. Therefore, the shadow mode that is more practical can provide more and larger range of extreme working condition data, including labeled and unlabeled training data.

According to the information disclosed by Karpathy, Tesla’s AI leader at CVPR2021, in order to obtain as much high-quality data as possible in the fleet, Tesla has developed a large number of triggers (221) that work in shadow mode.You can see that not every trigger here is related to human driver behavior. For example, if there is a radar vision mismatch where the judgment of the millimeter-wave radar and the camera does not match, such as when the millimeter-wave radar detects a target but the camera does not, the shadow system will trigger data reporting. Similarly, when the bounding box of visual detection shakes more than a certain threshold, this data will also be reported…

All of this data will be directly sent to the relevant modules of algorithms on the dojo, some of which are labeled, some require additional labeling costs, and some may not need labeling at all… These will be recorded in the training set.

As we can see, not all shadow modes depend on human driver behavior to mark and trigger data feedback. However, when domestic car companies talk about “shadow modes,” they still focus on “comparing the decision-making algorithms of the system with human driving behavior.”

Last year, when asked what other methods they had besides shadow modes to collect extreme condition data, a CTO of an unmanned driving company also mentioned “comparing the monitoring results of millimeter-wave radar and camera, and triggering data feedback if they are inconsistent.” Of course, they did not classify this as a “shadow mode.”

Dr. Cui Dixiao also added: “Besides shadow modes, there are a large number of data collectors that can work in autonomous driving mode (these data do not affect the ultimate vehicle control).”

  1. According to the implementation principle, shadow modes should not occupy too much computing resources or increase the processing latency of autonomous driving. However, currently, the perception chip installed in L2 production cars is mainly Mobileye’s EyeQ 4, and the closed Mobileye does not allow car companies to use their chips for shadow systems. This means that car companies need to equip an additional chip specifically for the shadow system.

If a shadow system chip is added to every car, it will obviously be an unbearable cost burden for cost control. Therefore, car companies will most likely only do shadow systems on a few vehicles.

Of course, if the main chip is provided by the “more open” NVIDIA, it can do shadow modes simultaneously. However, currently, only XPeng P7 and Lexus LS production cars are equipped with NVIDIA chips. In the future, the models equipped with the Horizon Journey 3 are the most likely to do shadow modes in large quantities.### 3. If the driving behavior of the system and the driver are inconsistent, the current scenario is considered as an “extreme condition.” However, the premise of this logic is based on the assumption that the “driver’s driving method must be correct,” and therefore believe that “as long as the system’s decision is different from the driver, the decision algorithm is wrong.” But the question is, is the driver’s driving method always correct? Moreover, in the same scenario, the driving methods of different drivers are not completely consistent, so how do we determine who is right or wrong?

Therefore, it is necessary to introduce a “god’s eye view” in shadow mode – if the driver’s behavior is correct, the decision algorithm should “learn from the good”;if the driver’s behavior is wrong, the decision algorithm requires enough determination to ensure that it is not misled.

In December of last year, Wang Haowei, the former chief architect of Forrtech, mentioned in a speech titled “Several Key Technology Design Strategies for the Next Generation of Autonomous Driving” that when there is a difference between the driver’s driving behavior and the autonomous driving control behavior, two situations can occur:

A. If the difference is significant, then it is judged that the driver’s driving process may have problems, and the system will issue a warning to the driver. If the driver suddenly steps on the accelerator beyond a certain rate and depth, the system will determine that the driver may have mistakenly stepped on the accelerator as the brake pedal through environmental detection, and then issue a warning about mistaken brake operation.

B. If the difference is small, it means that the automatic driving algorithm itself is still not perfect, and needs to learn new automatic control strategies from the current actual driving situation. For example, in a certain working condition, the system determines that it needs to slow down and slows down with a large deceleration, which results in a large stopping distance when the vehicle follows the front vehicle; in the subsequent training, the system needs to learn the actual depth and speed at which the driver steps on the pedal under that working condition, so the subsequent acceleration sent by the system will try to imitate the driver’s acceleration and deceleration rate and speed.

Dr. Cui Dixiao also gave an example: in some scenarios, if the driver adopts a driving behavior of sliding (no accelerator and no brake) (i.e. “fuel-efficient mode”), but the algorithm does not trigger “fuel-efficient mode,” then the shadow system will trigger data feedback from the perception, prediction, and decision modules.

It seems that a judging logic standard of “god’s eye view” has already been mentioned here, which is used to judge who is better between the driver and the machine and to help the automatic driving algorithm to make further optimization and control. However, in practical application, how to ensure the effectiveness of this judgment standard is still a challenge. If the driver makes a mistake and the “god” does not realize it, it will trigger feedback on invalid data, and may even lead to incorrect adjustment of driving behavior parameters.The evaluation mechanism of shadow mode is not scientific — it cannot directly obtain data from the decision-making end, but “backtracks” the trajectory from the execution end (with a control algorithm in “idle” mode) to determine whether there is an error in the decision-making process. This leads to the problem of mistakenly identifying control end issues as decision-making end issues.

There is also the possibility that the system’s decision appears inconsistent with the driver’s decision, but in reality, the decision algorithm itself is correct, and it is the perception module upstream that has issues, leading to misleading decisions. How can we distinguish this situation from the case where the decision algorithm itself has a problem?

In practice, many companies use the “retrospective method.” For example, in a scenario where there is an “unexpected emergency brake,” engineers need to trace the perception module’s data. If the perception module did not see the so-called “obstacle,” but only predicted that someone next to the vehicle was going to cut in, and decided to brake, but that person did not actually cut in, then it is a problem with the prediction algorithm. If the emergency brake was caused by the perception module’s judgement that “there is an obstacle ahead,” then there is a problem with the perception module.

Given the fact that the perception problem in autonomous driving has not been completely solved, it is common for perception failures to mislead prediction and decision-making. Therefore, the “prediction/decision-making failure” scene data collected by shadow mode at present has a particularly crude granularity, and much of it is invalid data.

However, this invalid data has already been deemed “valuable” after being “selected” by the shadow mode and will not be retained at the vehicle end, but needs to be transmitted back. Transmitting these invalid data wastes bandwidth and storage space.

In addition, undoubtedly, filtering out invalid data caused by perception failures through retrospective analysis requires manual effort and is costly.

Wang Haowei believes that if we have to handle the collected data from shadow mode manually, it will consume too many resources. “One idea is to introduce reinforcement learning methods from deep learning and let the system learn on its own, but this means that we have to fully accept a decision-making algorithm that has been perfected through ‘training,’ which is currently impossible. Therefore, we still have to rely on measurement statistics methodology to process the collected data.”5. In some scenarios, the perception module may have malfunctioned, but from the control end, there is no difference between the system and a human driver. In this case, the data cannot be obtained through shadow mode.

The reason is that the perception algorithm consists of a detection module and a post-processing algorithm module. If the detection module has a problem but the post-processing algorithm module can “smooth out” the traces of missed or mismonitored objects to achieve smooth and accurate tracking results, then the decision and control will not be affected.

To identify this type of data, some companies observe whether there is a significant jump in the data between the detection and tracking ends. If there is, data transmission is triggered. However, this is a labor-intensive process.

Only when the perception problems are basically solved and autonomous driving can enter the “decision-making algorithm” phase, can the value of shadow mode be truly highlighted.

  1. Now that the data has been transmitted back, do you have the ability to use it? The main way to get value out of data is simulation, but simulating with real roads is difficult. Currently, only a few companies have this ability, and most can only use algorithm-modeled data for simulation.

If you don’t have the ability to simulate with real data, the value of the data collected by shadow mode cannot be fully realized. Therefore, some car companies that initially claimed to be using shadow mode later said that the data they collected had “not been activated”. As a result, these companies that follow a gradual approach need to increase their investment in simulation technology.

  1. Even if the simulation ability is improved, companies that follow a gradual approach may not necessarily be able to beat companies that directly develop L4 algorithms using the data accumulated on L2 vehicles (usually equipped with lower sensor configurations). According to many senior industry experts, one of the biggest challenges for companies taking the gradual approach is whether they can reuse the data accumulated on L2 vehicles to train L4 algorithms.

To address the “data integration” issue, the strategy for many car companies is to do “embedded hardware” on L2 production vehicles. However, sensors are currently undergoing rapid iteration, and whether the hardware that is currently “embedded” can be “one-time” is still a big unknown.

(This point is discussed in more detail in our article “Robotaxi companies install L2 in production, opportunities and challenges coexist”.)

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.