Author: Yin Wei

Testing for autonomous driving is a highly complex system. In this article, we will break it down for you step-by-step, starting from the basics.

Before we start, let’s ask ourselves a question: what level of testing is needed for autonomous driving?

According to international standards, the probability of death from human driving in one hour is approximately 1/10^6. Worldwide, there are about 1.25 million deaths due to road accidents every year. If autonomous driving wants to become a reality, the probability of death needs to be much lower than this standard. Based on research, the socially-accepted death rate for autonomous driving is no higher than 1/10^9 per hour. Therefore, to reduce the death rate to 1/10^9, the software needs to be tested for 10^9 driving hours with every update, to ensure the reliability of its functions. Obviously, this method of testing with real vehicles is not feasible.

The actual testing system often uses a hierarchical approach, combining different testing methods with varying costs and coverage angles, to achieve similar results as real vehicle testing with controllable time and cost. Different testing methods have different costs, and the reasonable number of iterations is also different. An ideally equipped testing system must avoid over 60% of the potential problems in module logic testing, solve 30% remaining potential problems in simulation functional performance testing, and leave no more than 10% of problems for real vehicle robustness testing. Discover potential problems as much as possible within the respective means, and control the amount of problems in the subsequent testing means, and if paired correctly, the cost will remain within a controllable range while ensuring high coverage. For example, if a simulation testing system is complete, planning and development hardly require on-car verification, which can reduce a lot of peripheral support resources.

Comparison of different testing methods

Multi-layered testing methods are not without cost. Building a specialized testing system often involves a long building period and high initial costs. In both component testing and software, cases occur where it bypasses some testing processes to catch up with the schedule.

In fact, this is an economic calculation. When we skip a front-end testing process, the resources required for solving problems left behind by that process in the high-cost back-end testing process should exceed the cost of setting up the front-end process; otherwise, the entire testing system would not be worth the effort or investment.

Reasonable testing levels are, therefore, a balancing process.

But generally speaking, within a relatively mature and efficient R&D system, higher efficiency can often be achieved with more interlocking and successive testing systems.Effective testing often involves using specific testing tools and specific test cases to examine specific dimensions of the object being tested. Any testing system that targets potential issues in a cost-effective manner with a coverage range greater than other means is a good testing system, regardless of its classification. Practicability is very important in test design.

The testing process in engineering practice

In addition, a testing pipeline is often part of a training pipeline. The work of the testing system in the past was mainly to eliminate potential product hazards due to human error.

Test-driven development (TDD) is the best-known type of testing, which requires writing test code before coding a specific function, and then writing only the functionality code necessary for the test to pass, using testing to drive the development process.

Now, autonomous driving is moving towards a self-supervising process, and we are seeing more interaction between machines. This includes machine-to-machine testing feedback and development adjustments, which is what we know as deep learning. For humans, testing is about ensuring product consistency with the goal. For machines, training is also about achieving similar goals.

These are some basic ideas about testing. Next, let’s take a detailed look at some typical testing processes for intelligent driving. As shown in the figure below, I believe that systematic sorting can be done from three aspects: different cooperation modes, specialized fields, and different technological sections.

Common testing methods for autonomous driving

From the perspective of different cooperation modes, testing can be divided into black box, white box, and gray box testing.

White box testing checks whether each internal structure path is working properly according to design, and is generally used for internal management of product providers. Black box testing generally does not consider internal structure, but only checks whether product functions are implemented according to the technical requirements specified in the contract, and is generally used for internal management of the provider. Gray box testing is between the above two testing degrees. On the basis of testing external functions, it will confirm key links, and is generally used for provider release testing or recipient acceptance testing, depending on the specific cooperation.

From the perspective of different specialized fields, different domains have unique problems and their corresponding testing dimensions.From the perspective of software code, there are static testing and dynamic testing. Static testing analyzes whether there are errors and inappropriateness in program syntax structure and programming conventions. Commonly used tools include QAC/Converity, but static testing accounts for a smaller proportion in the entire testing system and is usually the first step in software testing. Similarly, code review organizes relevant experts to evaluate the static design of the code. Dynamic testing compares the results with expectations after running the program, analyzes running efficiency and robustness. At present, most of the software testing subjects for self-driving cars belong to the dynamic testing category, such as performance testing and various types of in-environment testing.

From different technical sections, it is the most complex and important among all partitioning patterns.

First, let’s explain the significance of setting the section. When we face a complex system problem with multiple factors mixed together, setting the section can isolate the influencing variables and simplify the complexity to a testable level. At the same time, it can convert the originally serial problem troubleshooting task into parallel tasks and shorten the project schedule.

As shown in the figure below, the bottom layer is unit testing, module testing, and module integration testing. On the development platform (X86), software functions, input/output of single or multiple modules are applied as sections, which are highly focused on verifying the correctness of the code logic. By injecting a large number of erroneous inputs and a small amount of correct inputs into the tested object through tools such as VectorCast and GTest, the feedback is confirmed to meet expectations， and this process is generally open-loop.

Module-level testing is also known as Model-in-the-Loop (MIL) in addition to considering partial correctness, and there are also some model performance indicators, such as the recognition accuracy of perception modules.

Software logic level testing methods

A stable software running on X86 may encounter a series of problems in the embedded environment, such as stack overflow, scheduling confusion, unstable timestamp, inadequate system call support, abnormal memory reading, and running blockage. To troubleshoot these differences, as shown in the figure below, the dimension of the target hardware can also be introduced above the software logic level, that is, Processor-in-the-Loop testing (PIL), which places part of the code on the target processor to verify the correctness of the code function while confirming whether its performance meets the requirements. For example, the longest software time-consuming, system call reliability, etc. Software in-environment testing generally evaluates correctness, while hardware in-environment testing generally evaluates stability.

PIL testing methods As shown in the figure above, all the tests mentioned are generally open-loop and do not verify the interaction with the environment. When we add interaction between software and hardware and virtual or real environment dimensions, we introduce the concepts of SIL (Software-in-the-Loop) and HIL (Hardware-in-the-Loop) testing.

After introducing environmental factors, scenario libraries are also introduced as test cases. In addition to verifying basic logic, the testing process also evaluates some operational service indicators of intelligent driving.

SIL testing does not consider the target hardware, can be deployed on servers in large quantities, and has low cost, core used to verify the correctness of the closed-loop operation of intelligent driving functions. It can be divided into local closed-loop testing using semantic-level simulation systems and full-function closed-loop software testing using environment-rendering level simulation systems.

SIL is currently one of the most promising testing methods, so let’s take a closer look at it. Although methods such as unit testing and module testing have high automation rates, they cannot directly detect functional problems in intelligent driving systems. While hardware-in-the-loop testing and on-road testing make problem detection more intuitive, they are more expensive. SIL strikes a good balance between these methods and is a cost-effective means. From the inside view of the SIL system, the core ensures repeatability. If the test cannot reproduce past experimental results, it will have a great impact on subsequent evaluations. If repeatability cannot be fully maintained due to multi-threading and other reasons, the variance and stability need to be confirmed after multiple experiments. From the perspective of the entire testing system, the closer it is to the inside (such as unit testing), the easier it is to control repeatability, while the closer it is to the outside (such as on-road testing), the more difficult it is to control. From the outside view of the SIL system, the core is automation rate and the ability to deploy on a large scale in parallel, as the largest test method in the entire testing system with comprehensive analysis. Reducing manual intervention and increasing concurrent deployment capability can effectively reduce testing costs and improve testing efficiency. In the closed-loop system of intelligent driving, the SIL system also begins to provide iterative training services for planning. The indicators and test cases used in safety evaluation, functional evaluation, regulatory requirements evaluation, comfort evaluation, etc. during simulation testing are actually “loss functions” used in the control training process.The HIL test is different from the SIL test in that it needs to consider the target hardware and generally will not be deployed in large quantities due to its high cost. Its result is closer to the real state compared to SIL, and can additionally evaluate the overall performance of the software on the target hardware (runtime scheduling, memory calls, and computing power calls) to see if it meets expectations. In practice, it is recommended not to be too fixated on the full-function, long-period HIL test bench, as twenty lightweight HIL test benches (PIL test benches) may be cheaper than one full-function HIL test bench, with little difference in effectiveness. Testing with a partial physical IO and a partial functional simulation is often more scientific, and the HIL test bench is generally only used for short-cycle closed-loop testing, with a large error in long-period testing.

SIL and HIL Test Methods

After completing the single controller test, the smart driving test will continue to the whole vehicle level, as shown in the diagram below. The first one to introduce is the VIL (Vehicle-in-Loop) test or the actual vehicle virtual injection test, which configures the cross-sectional test interface within the software and shields some real perception inputs in the closed test environment of the actual vehicle test area, thus simulating any form of road environment in the open area of the test area. For example, adding non-existent vehicles on the road or simulating a traffic light switch at an intersection. Since all other test elements are real content, the test has high credibility and can fully utilize the environmental resources of the closed test area.

VIL Test Method

Another new form of VIL is the VTEHIL (Vehicle Traffic Environment HIL) test, which constructs a simulated surrounding environment and vehicle movement in an indoor field to test smart driving cars. Since the environment is completely controlled and not affected by weather changes, it can achieve 24-hour continuous testing and efficiently and completely simulate extreme conditions.

VTEHIL Test Method

Further down is the RIL (Road-in-Loop) test or closed-field test. Except for environmental participants and drivers, everything else is real elements.In the conventional automotive testing system, this testing method is also routine, but unlike the past manual remote control and placement methods, automated testing solutions have emerged. With the latest equipment of humanoid robots and dummy vehicles equipped with necessary sensors, actuators, and communication devices, they can be connected to cloud-based centralized command and dispatch. Therefore, test cases from the cloud can be synchronized and performed by intelligent humanoid robots and dummy vehicles in the closed testing field, greatly improving the efficiency of completing a test.

RIL Testing Method

Compared with controller-level testing, whole-vehicle level testing is more concerned with experience-related indicators, such as takeover rate, robustness, etc. In addition to VIL and RIL, whole-vehicle level testing also includes LABCAR testing and large-scale on-road vehicle testing, which are conducted together with other traditional testing processes for the whole vehicle.

LABCAR Testing Bench

After the individual intelligent driving controller test is completed, it needs to be handed over to the whole vehicle department for testing in the entire electronic and electrical architecture. This test is called LABCAR testing, which can also be understood as hardware-in-the-loop (HIL) testing composed of several controllers. By simulating peripheral sensor and actuator information to detect whether the entire electronic and electrical system is working normally, LABCAR can also inject faults (short circuits, open circuits, etc.) to detect whether the reaction under abnormal conditions meets expectations.

Compared with system testing, whole-vehicle testing often focuses not on individual functions, but on common dimensions that may have comprehensive impacts, such as vehicle noise performance testing. In many related issues, the involved components often cannot be replicated in single-unit bench tests or system-level bench tests, and can only be observed under certain special conditions of the whole vehicle.The general method for vehicle testing is road testing. First, the six basic performance of the vehicle, including power, fuel efficiency, brake, stability, and drivability, are quantitatively analyzed with standard objective tests. Smoothness may involve engineer calibration and may vary with different styles. In addition, comprehensive tests are conducted in various extreme environments (such as high-altitude, high-cold, high-temperature). This is often referred to as “going to Heihe in the winter and Hainan in the summer”. Generally, the test conditions of all tests are more stringent than normal driving conditions, which effectively improves the testing efficiency. Of course, this includes NVH performance, durability, and other tests that can only be performed in the entire vehicle environment. Overall, the core logic of vehicle testing is similar to that of component testing. Since testing requires licenses, insurance, drivers, and a large number of other human and support resources, time and economic costs are high. Therefore, vehicle testing is often more concise and strictly planned. The number of experimental conditions and test times will be precisely calculated and estimated based on theories and experience. Another goal of vehicle testing is to obtain government approval announcements. In China, there are mandatory inspection standards for passenger cars, with about 40 items. For vehicles that can be sold in the market, these tests must be passed.

Large-scale real car testing

Large-scale road testing is also necessary for intelligent driving systems, with slightly different reasons than traditional road tests. Because the real traffic is more complex and unpredictable, driving simulators or controlled field tests can only reproduce a small part of it, and the evaluation results may deviate from reality. Therefore, large-scale road testing is needed to verify the operation of intelligent autonomous vehicles in the entire traffic environment. In open road testing, functional data, behavioral data, and environmental data must be collected synchronously. Functional data often comes from the intelligent driving system itself. Behavioral data is mainly about monitoring driver reactions and comes from additional installed internal cameras, eye tracking devices, physiological testing equipment. Environmental data also comes from the vehicle’s own environmental sensors and some additional high-performance sensors, such as laser, INS, or high-definition cameras. Of course, this method has been replaced more by a data loop method.

The above is the introduction to all test methods related to intelligent driving systems. Individuals are often difficult to come into contact with all these tasks, but understanding the overall situation is instructive for individuals to understand the significance of their testing tasks in research and development.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.

Understanding Autonomous Driving Test System in One Article

Author: Yin Wei

Related Post

What Drives the New Xpeng P7’s Popularity?

Can VOYAH’s Surge Overtake LI in Sales?

Will DEEPAL S05 Redefine Sub-15k SUV Range Standards?

You Missed

What Drives the New Xpeng P7’s Popularity?

Can VOYAH’s Surge Overtake LI in Sales?

Will DEEPAL S05 Redefine Sub-15k SUV Range Standards?

How is Geely Advancing AI in Smart Vehicles?