*Author: Xiao Meng
This series mainly explores the software architecture of intelligent driving domain controllers. Intelligent driving requires collaboration among many different professionals, not all of whom have backgrounds in software or automotive software.
In order to make the article understandable to people with various backgrounds, this article tries to use very common language to describe it, and is supplemented with various diagrams to explain it.
This article avoids using ambiguous terms. All terms are given their exact definitions in this article when they first appear. The content of this article is a series of articles, and there will be multiple installments.
Software Architecture Fundamentals and Issues
This article mainly presents issues, consisting of two chapters. The first chapter discusses the concept model of “EPX-SA” for intelligent driving, which proposes that the software architecture for intelligent driving needs to solve the mapping problem from physical reality to program reality. Chapter 2 mainly talks about the software architecture of the vehicle controller for intelligent driving functions below Level 2. It discusses the mapping relationship between this software architecture and the concept model presented in the first chapter, and explains why existing Level 2 software architectures are difficult to support the development of Level 3 and above intelligent driving.
The Importance of Intelligent Driving Software Architecture
1.1 Simplified Concept Model of Intelligent Driving
The simplified concept model of intelligent driving is to solve three core issues:
-
Where am I?
-
Where do I want to go?
-
How do I get there?
The first question, “Where am I?” needs to solve the problems of “environment perception” and “location”, which require understanding the vehicle’s position itself and the static environment (such as roads, traffic signs, traffic lights, etc.) and dynamic environment (cars, people, etc.) surrounding that position. This leads to a series of technical solutions for perception and location, including various sensors and algorithm systems.
The second question, “Where do I want to go?” in the field of autonomous driving is “planning and decision-making”. This leads to some terms such as “global planning”, “local planning”, “task planning”, “path planning”, “behavioral planning”, “behavioral decision-making”, and “motion planning,” etc. Due to linguistic ambiguity, some of these terms have different meanings but are often used interchangeably, while others have similar meanings but have some differences in different contexts.
Apart from specific terms, generally speaking, this “planning and decision-making” problem is divided into three parts:
-
Planning at a global level within a certain range (commonly used terms: global planning, path planning, task planning)
-
Dividing the results of the first step into multiple stages (commonly used terms: behavioral planning, behavioral decision-making)
-
Further planning for each stage (commonly used terms: local planning, motion planning)
These various planning aspects have led to many algorithm systems to solve problems.The third question, “How should I proceed?” generally refers to “execution control”, which is the actual implementation and practice of the smallest planning to achieve the goals of the plan. Specifically in cars, this is often reflected in various control algorithms, which are what control theory solves.
Because the solutions to these three questions ultimately boil down to algorithm problems, in a sense, the core of autonomous driving is algorithms. And software architecture, in a sense, is to be able to support these algorithms. Without a good support system, even the best algorithms will be useless.
1.2 Fractal Recursion of Basic Concept Models
For ease of reference, we represent the three questions of the basic concept model as E, P, X, representing Environment, Plan, and eXecute, respectively. Each E-P-X group has its own problem space.
For example, if we define problem space A as “driving from Beijing to Guangzhou”, then for the E problem, the focus may be on the current area in Beijing, and does not need to be detailed to the streets. We also need to pay attention to whether there are thunderstorms and other weather conditions, as well as structural information of roads above the provincial level. For the P problem:
P-1: First design a global path, which highway, national highway, and provincial road to take.
P-2: Plan a series of actions based on the global path, such as which highway intersection to reach first, how many kilometers to drive, when to go to the service area for refueling or change to another road, and so on.
P-3: Plan the specific road path for each segment. For example, whether to take the Third Ring Road or the Fourth Ring Road when reaching the highway intersection, and which road to switch to.
X executes every step planned by P.
If we define problem space B as “safely passing through an intersection”, then for the E problem, we need to pay attention to current road information, traffic light information, vehicle conditions on the road, pedestrian conditions, and so on. For the P problem:
P-1: First plan a safe path through the intersection, including which lane to take to reach the current intersection based on traffic rules and road information, and which lane to enter the target intersection.
P-2: Second, based on the results of the first step, plan a series of action sequences, such as slowing down, switching to the target lane, stopping for red lights, accelerating when it turns green, passing through the intersection.
P-3: Third, for each action in step 2, design a specific trajectory that can avoid obstacles such as pedestrians and vehicles.
X is responsible for executing the output results of the P problem.
This problem space B is closest to the problem that planning algorithms usually solve. The first step of P-1 is often referred to as “global planning” or “task planning”, P-2 is often referred to as “behavior planning” or “behavior decision-making”, and P-3 is referred to as “local planning” or “motion planning”.
As shown in the image below, E-P-X forms an abstract basic concept model, and problem spaces A and B are specific implementations within a certain range.
Problem spaces A and B both have similar EPX structures, but they differ greatly in their temporal and spatial spans of problem-solving. In the above figure, task “Complete entry into the Fourth Ring Road” in A:X can be accomplished with a lower-level EPX loop. Thus, the EPX model as shown in the figure below is a fractal recursive structure.
The higher-level X can always be further decomposed into smaller granularity EPX to be executed.
“Fractal,” also known as “self-similar fractal,” has a popular understanding that the local structure of an object is similar to its entirety, exhibiting symmetry at different scales. For example, a branch of a tree is a similar structure to the entire tree, and furthermore, the veins of each leaf are also similar structures. The following figure lists some typical fractal structures.
These six figures have a common feature: any local part of each figure has the same structure as the entire figure. Upon further magnification of any local part, its local part is also of the same structure.
Therefore, when we have a set of business processing logic applicable to the entire system, it can also be applied to local parts. It is like the cultivation of some trees, where a branch can be taken and planted to grow into a new tree. Mapped to the expression of software programs, this is “recursion.” This does not mean using recursive functions to handle this, but it refers to recursive architecture levels.
“Fractal” has a more academic expression of “using a fractional dimension perspective and mathematical methods to describe and study objective things, breaking through the traditional barriers of one-dimensional lines, two-dimensional planes, three-dimensional solid, and even four-dimensional space-time, approaching a more accurate description of the true properties and states of complex systems, and better conforming to the diversity and complexity of objective things.” When we find an appropriate mathematical expression for “physical reality” and then convert it to “programmatic reality,” we can find a more concise, clear, and accurate software architecture and implementation.
1.3 Software architecture solves the mapping from “physical reality” to “programmatic reality”
E-P-X Architecture and Software Design for Advanced Driver Assistance System
The E-P-X architecture is a structure abstracted from physical reality. It consists primarily of various algorithms. The research and development of each individual algorithm can be carried out independently based on predefined input and output conditions. However, combining algorithms at the appropriate time, triggering them correctly, and using their results appropriately result in a practical application. The software architecture is the bridge that connects physical reality with program reality.
The complexity of the software architecture increases as the level of autonomous driving increases, i.e., from Level 1 to Level 5. Most Level 1 and Level 2 functions require only one layer of E-P-X architecture. For instance, in the automatic emergency braking (AEB) system, the E-part is responsible for static recognition and classification of front targets (vehicles or people), dynamic tracking, and detection of distance and speed. Because AEB only performs longitudinal control, the P-part can plan the speed control for a certain period. The X-part then executes the speed plan by the vehicle control unit. However, a single layer of EPX does not make AEB simple. In fact, it is still challenging to produce AEB function on a large scale. The second chapter will introduce the common software architecture for L2 functions.
What is scene-based scheduling?
Even in the lowest granularity level of EPX, a single implementation of EXP cannot solve all problems at this level. For example, even with a simple straight line driving test, we can implement the X-unit with a vehicle control algorithm that handles all flat, uphill, and downhill scenes. Alternatively, we can use a scheduling unit (S), identify flat, uphill, and downhill scenes according to the information from the E-unit, and switch to the next lower EXP loop. Each EXP loop in the next level implements a single scene.
In fact, even if we use a controlling algorithm for X-unit to solve all flat, uphill, and downhill scenes, the algorithm will still differentiate these scenes internally, and it is still a small-granularity EXP loop. Scene scheduling (S) can exist at every level of EPX, and there is a granularity division in defining “scene”. Therefore, the EPX model should be the EPX-S model. There is no obvious S-part in levels below L2.
1.3.2 How to be Compatible with L1 to L3+ in Software Architecture
The automated parking assistance function requires scene scheduling, such as vertical parking, vertical entry, horizontal exit, horizontal entry, diagonal entry, etc., with their P and X parts implemented separately. However, the scene scheduling can be selected manually through the HMI interface and is an “human-in-the-loop” S unit.
For Level 3 and above functions, a long and continuous driving process does not require manual intervention, thereby involving the scheduling of multiple different EXP levels automatically. This makes the software architecture different from that of L2 and below functions. This is also one of the reasons why many companies have separate teams for L2 and L3+ functionalities.
In fact, as long as the software architecture is consciously designed based on a multi-level EXP-S model, each EXP-S unit has its suitable embodiment, and a set of software architecture that supports automatic driving from L1 to L3+ can be implemented. It is only that the S unit is relatively weaker for L2 and below functions, but its architectural status still exists.
Software Architecture of L2 and Below Single Function Controllers
Let us first take a look at the commonly used software architecture of L2 functions, where we will focus on:
2.1 Solution using Smart Sensor for AEB/ACC/LKA
AEB/ACC/LKA are the three most classic driving assistance functions for L2, and the perception part of the general system solution mostly uses the output target (vehicle, pedestrian) information from forward camera and fuses it with the target data given by forward millimeter wave radar to obtain more accurate speed and distance. Visual perception and radar perception mostly use Smart Sensor solutions, so Tier 1 can choose products from mature Tier 2 suppliers. Tier 1’s main development work includes perception fusion, function state machine implementation, and vehicle control algorithm.
2.1.1 Common hardware architecture
Option 1: Forward visual perception results are transmitted to radar perception controller, where perception fusion and L2 function state machine are completed.
Option 2: An independent L2 controller connects Smart Sensors for visual and radar perception, and the L2 controller completes perception fusion and L2 function state machine.
## Two Typical Approaches
Two typical approaches are adopted by the industry. In approach one, a high-performance radar controller is used, and a portion of the computing units are allocated to implement the fusion algorithm and functional state machine. When designing radar processing chips, many chip manufacturers already take this computational capability into account.
For example, NXP’s S32R series, specifically designed for radar ECU, has multiple cores that are sufficient for performing both radar signal processing and implementing L2 functions, since the most computation-intensive visual algorithms are performed on separate devices.
In approach two, a standalone controller is used to implement L2 functions, and the perception data is obtained by communicating with two smart sensors through a private CAN bus. Generally, this approach can accommodate multiple L2 functions if necessary, and more smart sensors can be added later.
Typical Software Architecture
For a system architecture that adopts smart sensors, the forward-facing intelligent camera and forward-facing millimeter-wave radar each provide their own observations of target objects in the environment’s semantics. These two sets of data are transmitted to the module responsible for perception fusion and L2 function implementation directly through the CAN bus or the IPC (inter-process communication) mechanism within the computer OS.
Regardless of whether hardware approach one or approach two is used, the most commonly used software architecture in the industry is developed based on Classic AUTOSAR. Classic AUTOSAR provides common functions for vehicle ECU and provides an execution environment and data input/output channels for referenced software.
The perception fusion module and other ACC/AEB/LKA functions can be implemented as one or more AutoSar Software Components (SWCs). It is up to each developer to decide whether to split these SWC components, how to split them, and what their respective logical relationships are. However, the basic architectures are similar.
Of course, Classic AUTOSAR does not have to be used, and other suitable RTOS can be chosen as the underlying system. Perhaps the difficulty of developing general-purpose functions for vehicle ECU and meeting functional safety standards will be greater, but there is not much difference in the architecture system for application function development compared to using the Classic AUTOSAR approach.
MBD Development Approach
The industry commonly uses the Model-Based Development approach to realize perception fusion algorithms, plan and control the execution algorithm, and ACC/AEB/LKA function state machine. Then C code is generated using tools, and manually integrated into the target platform.One of the advantages of MDB development is that it allows developers to directly debug and develop applications using model tools and devices such as CanOE, or connect to a simulation platform for development and debugging. The development team can be divided into two parts, with one part developing application functionality using model tools and the other part developing a series of tedious tasks that all in-vehicle ECUs need, and then integrating them.
2.2 Higher Integration Product Solutions
Generally, fully automatic parking products adopt higher integration solutions. The visual algorithm (static obstacle recognition, dynamic obstacle recognition, pedestrian recognition, parking line recognition) and ultrasonic radar algorithm (distance detection, obstacle detection), as well as trajectory planning and control execution during the parking process, are all integrated into one controller. Higher integration also supports the 360 surround viewing function in the automatic parking controller, which requires the support of camera surround image calibration, 2D/3D graphic rendering, video output HMI generation, and other functions.
The schematic hardware topology is shown below.
2.2.1 Schematic Hardware Topology
The modules in the figure have different distribution patterns in different hardware selection schemes. Generally, the MCU used for real-time processing will be separately implemented on a chip. Different chip manufacturers have their own AI unit solutions, some using high-performance DSPs, some using proprietary matrix computing units, and some using FPGAs. There are many options.
2.2.2 Typical Software Architecture
The following figure shows a typical software architecture for an automatic parking system (only a simplified illustration, the actual mass-produced system will be much more complex):
Compared with 2.1.1, the most significant change here is the distinction between the “real-time domain” and “performance domain” system. Generally speaking, we refer to the software and hardware systems on an MCU or other real-time kernel (such as Cortex-R, Cortex-M, or other equivalent series) as the “real-time domain.” The large cores in SOC (such as Cortex-A or equivalent series) and the Linux/QNX systems running on them are collectively referred to as the performance domain, which typically includes a software and hardware system that supports visual algorithm acceleration.
Although the engineering difficulty of mass-produced automatic parking systems is much smaller than that of Level 2 active safety functions such as ACC/AEB/LKA, the software architecture of parking systems is more complex. This is mainly reflected in the following aspects:
The division between real-time domain and performance domain brings systematic complexity, requiring different hardware platforms to be selected for different calculations based on real-time requirements and computing resource demands.
-
When the OS of the performance domain adopts Linux, the execution environment at the OS level is much more complex than that of RTOS.
-
A series of frameworks or middleware are required to support application development.
-
Increase in complexity of data path.
-
Increase in processing path for video streams.
-
Increase in data path requirements between real-time domain and performance domain.
-
There are data communication requirements among various software modules in the performance domain.
-
After the architecture design is implemented on a specific chip platform, integration and overall design coordination with various SDKs of the chip are required.
In addition, through this software architecture, we can see some of the problems introduced by the introduction of Linux (or QNX/vxworks). These systems themselves are computer operating systems that are not specific to any industry. When used in automotive electronic controllers, there are a series of generic features that need to be implemented that are not related to specific product functionality but required as automotive ECUs.
For example, diagnosis, clock synchronization, upgrading, etc. These account for a very large proportion of the overall controller development, often exceeding 40%, and are closely related to the reliability of the controller.
In the field of network communication devices, these are often referred to as management planes. Many of them are also basic capabilities provided by AutoSar AP. In fact, regardless of whether it is CP AutoSar or AP AutoSar, apart from the communication modules, most of them are capabilities of the management plane.
2.3 Synergy of Multiple Single-function ECUs
How to coordinate multiple L2 functions on a single vehicle. The following figure is a simplified example of a multi-controller topology.
This topology integrates six controllers. The functions provided by “fully automated parking system”, “forward intelligent camera”, and “forward millimeter wave radar” are as described earlier. The left and right corner radars are two mirror devices, which can independently perform functions such as “rear alert” and “door alert”. The “driver monitoring system” detects the driver’s status. If fatigue driving is detected, an alert is issued. If the driver loses all ability to act, other systems are notified to try to decelerate and park safely.
The following key points are present in this topology: the introduction of domain controllers connecting multiple independent driver assistance function controllers, the domain controller is connected to the backbone network; multiple Can buses are present within the driver assistance domain to avoid insufficient bandwidth on the bus.From a software architecture perspective, each driving assistance controller runs independently, autonomously determining when to activate and deactivate its own functions. Relevant control signals are sent to the domain controller, which then forwards them to the power domain. The driving assistance domain controller is responsible for making judgments on the control output of each independent controller.
Regarding the role that the domain controller can play here, there are various possibilities for design, ranging from light to heavy. In a lightweight domain controller design, the domain controller only performs simple data forwarding, screening the data on the backbone network before sending it to the controllers within the domain. The control signals of the controllers within the domain are then sent to the backbone network. This method does not place high demands on the computational power of the domain controller.
If the domain controller takes on more responsibilities, it can also take over the real-time computational work of other controllers within the domain. For example, the planning and control calculations of ACC / AEB / LKA would be carried out in the domain controller. This requires that the other controllers send their sensing data to the domain controller for unified processing, which places higher demands on the computational power and bandwidth of the network within the domain.
Furthermore, because the domain controller has access to all the sensing data, it can develop additional features, such as lane-change assistance or emergency obstacle avoidance, relying on the sensors in the image.
As the functions gradually move towards centralization within the domain controller, the non-sensing parts of the other controllers that handle perception will gradually be weakened.
2.4 EPX-SA Conceptual Model
Arbitration Mechanism
As mentioned earlier, the implementation of ACC / AEB / LKA functions, fully automatic parking, and other single L2 or below functions can all be understood as one or two layers of fractal recursive EPX-S models.
In fact, ACC / AEB / LKA functions can be opened simultaneously in actual mass-produced products, and each of them is a separate EPX-S loop. This means that multiple EPX-S loops can run in parallel simultaneously. If multiple X outputs are generated at the same time, they need to be arbitrated by an arbitration mechanism.
When multiple controllers are running simultaneously, the domain controller performs arbitration at a higher level.
Therefore, the EPX-S model should be extended to the EPX-SA model. As shown in the figure below, Scenario 1 and Scenario 2 are two EPX-S loops that run in parallel, and the X produced by them is output after arbitration.
Introduction to the Environmental Model ConceptEach controller that implements a single L2 function has a certain perception ability in some aspect. They describe the properties of a certain aspect of the vehicle’s internal and external environment, which can be divided from a spatial perspective or from a physical property perspective, such as visible light, ultrasound, millimeter waves, and lasers. If an ideal and accurate 3D model system is established for the entire environment, each sensor is equivalent to a filter of this model. Like “the blind men and the elephant”, each sensor expresses the properties of a certain aspect of the ideal model.
In fact, the perception part of intelligent driving is to use as many sensors as possible plus algorithms to approximate the ideal model as much as possible. The more sensors there are and the more accurate the algorithms are, the less deviation there will be from the ideal model.
In the L2 domain controller, if necessary, it can access the perception data of all sub-controllers. At this level, a real model of the ideal environmental model can be pieced together based on all the perception results, even though it still differs greatly from the ideal model, it is already an overall environmental model.
The information provided by the environmental model is not only used in various planning modules (P), but also in scheduling (S) and arbitration (A) phases.
This is the end of the first article. The second article “Supporting Software Architecture and Product Architecture for L3+” will be updated in the near future.
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.