Author: Delu
Recently, there have been two interesting things happening in the autonomous driving industry. One is the first mass-produced car, KiWi, equipped with the DJI Carrying Lingxi Smart Driving System, has been officially launched. The other is MAXIEYE (Intelligent Driving Technology)’s MAXIPILOT platform based on a single TDA4 for the integration of driving and parking will also be put into mass production.
Why are they interesting?
These two systems have two common features. One is that the current mass-produced versions are products that balance cost and performance to meet the mass consumer market. The second is that they are both called “integrated driving and parking” assisted driving schemes.
The first point is easy to understand. In 2022, the intelligent driving industry is in a state of parallel phenomena. The self-developed faction, represented by Tesla, WeRide, and Huawei, is striving to achieve “entering the city” high-level assisted driving, while low-priced cars are accelerating the development of L2-level schemes. This type of car accounts for 80% of the automotive market, so L2-level assisted driving is forming a scale that supports high-speed navigation.
This has created a huge demand in the mid-low-end car market. For mid-low-end cars, there are two core keywords for assisted driving: “low cost” and “high reliability.”
On this basis, the arrival of advanced and cost-effective “integrated driving and parking” schemes seems to have brought a breath of fresh air to the OEMs.
And how to obtain high-quality and high-value data of vehicle operations on the road at a low cost and high efficiency for iterating technology has become the focus of the competition among players. We found that MAXIEYE is indeed worth talking about.
Therefore, we have identified a few issues to discuss further:
- What is the “integrated driving and parking” scheme?
- What unique solutions can MAXIEYE provide for passenger cars?
- What are the difficulties in implementing an integrated driving and parking scheme based on TDA4?
- As the solution moves towards forward-mounted mass production, what are MAXIEYE’s technological features?
The War for Integrated Driving and Parking has Begun
Before discussing the “integrated driving and parking” scheme, let’s talk about some of the current situations in the intelligent driving industry.
First, let’s take a look at some data:
- From January to June 2022, the forward-mounted installation rate of high-level assisted driving in China was 26.64% (with L2/2.5 function upgrading).
- From January to July 2022, the popularity rate of assisted driving with basic L2 function in China’s passenger car full model range was 22%.- In January to July 2022, 2.839 million vehicles with L2-level intelligent assisted driving were equipped with insurance, an increase of nearly 70% compared with the previous year. The average price range of the models is between 150,000 and 200,000 yuan.
- This set of data shows that the market has a more urgent demand for L2/2.5-level assisted driving.
- However, in the intelligent driving part, except for Tesla, Ideal, and XPeng’s self-developed systems with hardware from third-party suppliers (Tesla’s hardware is also self-developed), the other top ten suppliers are still traditional foreign Tier 1 suppliers. Among them, Denso, Bosch, and Continental rank the top three.
- At present, in the mainstream single-camera ADAS solutions provided by Tier 1, Mobileye relies on its high-performance product solutions to focus on the mid-to-high-end market, while Bosch seizes the market with low cost and performs well in the domestic market but with a downward trend.
- The reason is that although both Mobileye and Bosch have strong engineering capabilities, Mobileye has always provided a closed solution that is delivered as a black box and does not open its visual perception algorithms, so vehicle manufacturers have little say in defining the product, and it has gradually been abandoned. The advantage of Bosch’s solution is “standardization”, which can save the maximum development cost and can be installed and used; the disadvantage is also “standardization”. Intelligent driving systems are a set of systems that need to be updated, and Bosch is difficult for many customers to iterate the system’s capabilities together.
- This is the first pain point of assisted driving: a system that has “openness” and can be “continuously iterated”.
- The other end of the market is companies such as Huawei, Baidu, and XiaoMa Zhixing, which are focused on researching L4. Although the system can cover more scenarios and its application is more advanced, there are still more than a dozen conventional cameras, 5 millimeter-wave radars, 12 ultrasonic radars, plus multiple lidars, and a chip with hundreds to thousands of TOPS. The hardware cost of the entire system is at least RMB 20,000-30,000, and ultimately, it still achieves assisted driving and cannot fully meet the L4 standard in the industry.
- Obviously, this is difficult for the mainstream consumer market to accept. This leads to the second pain point: a system that is extremely “cost-effective”.
- Why say all of these?
- Simply put, under the huge market demand, whoever can provide a set of low-cost, high-performance, and highly open system will have the opportunity to capture more market share.
- What does this have to do with the “integrated parking and driving” solution?
- We all know that there is an industry consensus goal for automatic driving: to drive the entire vehicle’s automation with a set of system architecture.So the technical architecture needs to upgrade to a centralized integrated architecture. However, this requires a process that requires coordinated progress in basic industrial technology, such as large model neural network algorithms, higher-precision sensing sensors, and large computing platforms. And the industry has a term for this gradually upgrading technology process, called the “progressive” autonomous driving route.
Simply put, use existing technology to solve the current problems and quickly iterate to achieve mass production of advanced features.
From a technical perspective, the current autonomous driving systems are still very different. There are multiple systems and hardware in parallel. For example, although the hardware has been integrated into the domain control architecture, most automakers still have 3 to 4 domain controllers, and different chips are required for driving and parking. On the software side, there is still one set of advanced navigation assisted driving, one set of high-precision map downgrade to L2 after breakpoints, and one set of parking systems, and most solutions are still provided by different suppliers.
The result is that the vehicles have achieved unified functionality and have the ability to drive and park, but the technical architecture is still not the optimal solution, and the hardware and development costs of suppliers and OEMs are still not low.
How to solve this?
In one sentence: integrate again.
This brings us to the “driving and parking integration” solution. The first stage of the industry’s approach is to use different chips to handle driving and parking. The second stage is to integrate the chips for driving and parking into one controller on the hardware side, running two sets of algorithms inside.
Currently, the industry is generally in the second stage. This form of driving and parking integration does not truly realize the ultimate performance of hardware, and the hardware cost has not decreased for automakers. Currently, more solutions involve using multiple chips, such as EyeQ4 + TDA4, J3 + TDA4, Dual TDA4 (DJI Lingxi Smart Driving), and EyeQ4 + S32V (NIO 866).
The third stage is to write the driving and parking algorithms into one chip. Constantly upgrading the bottom-level software, sensor reuse, and other dimensions to achieve a high-performance/cost ratio, high-integration driving and parking integration solution.
The core of the driving and parking integration solution is to “reduce costs” and “simplify the technical architecture.”
This approach has several advantages:
- Reducing the number of hardware components to reduce hardware and development costs.
- Unifying the adaptation of one chip can improve system operating efficiency and achieve performance optimization.
- Simplifying the technical architecture can improve standardization capabilities by involving fewer suppliers.
That’s why Horizon, DJI, and WeRide all have driving and parking integration plans. And the competition in this field will only become more intense in the future.We have learned that MAXIEYE addresses the challenge of high chip performance requirement for two sets of algorithms. If a high-performance chip, such as Orin, is used, the cost will not decrease. Therefore, it is necessary to maximize the algorithms based on a single J3, S32V, or TDA4.
MAXIEYE’s approach is to improve the efficiency of the algorithm and chip performance at the embedded software level, and achieve mass production relatively quickly.
What makes MAXIEYE stand out?
Before discussing this technology, let’s introduce the company MAXIEYE (Chinese name: IMa Keji), a full-stack innovator and system solution provider in the field of intelligent driving and smart transportation.
Their main products are ADAS and ADS system products and solutions, covering the technology and service loop of L0-L4.
According to MAXIEYE’s product plan, it can be divided into three versions based on the system:
1. MAXIPILOT 1.0 and MAXIPILOT 1.0 PLUS
This system uses Ambarella’s CV2 series chip, and the perception hardware can be freely cut from 1R1V – 5R1V according to the needs of the vehicle model, which can achieve NOM high-speed scene navigation of ADAS.
2. MAXIPILOT 2.0 and MAXIPILOT 2.0 PLUS
In this version, MAXIEYE has already carried out integrated design of the system. MAXIPILOT 2.0 can achieve the combination of driving and parking with one TDA4 chip, and the function can reach NOA. The perception hardware can reach 5R5V1D. The expected time for mass production is in 2023.
MAXIPILOT 2.0 PLUS is based on a 50 TOPS-level chip and will deploy a large model algorithm for BEV. It can achieve a series of intelligent driving functions and the driving and parking combination solution, including AEB/FCW/LDW/LKA/TSR/IHBC/ACC/ICA/TJA/ALC/NOM/AVP.
3. MAXIPILOT 3.0
This version will be developed based on a 100+Tops computing platform, supporting 5R11V3L fusion perception hardware. It can achieve the driving and parking combination solution containing city navigation auxiliary driving.
The basic version of MAXIEYE’s driving and parking combination solution to be mass-produced will be equipped with the MAXIPILOT 2.0 system. The feature of this system is that it is designed based on one TDA4 VM chip. Currently, the TDA4 chip has several versions (from official TI data):
-
TDA4 VL: 4 TOPS computing power
-
TDA4 VM: 8 TOPS computing power- TDA4 VH: 32 TOPS of computing power
MAXIEYE uses TDA4 VM.
As known in the industry, the computing power of this chip has reached 8 TOPS. However, because the TDA4 chip was originally designed for cabin use, the available computing power may be restricted in the process of driving development and may cause congestion in data flow processing and task allocation. This will affect the detection performance of the system core functions, such as the missed detection of visual perception information. Therefore, although many solutions of self-driving companies are based on the TDA4 chip to develop integrated solutions for parking and driving, the final solution that can be put into mass production is mainly for parking, with limited availability for driving.
Therefore, if you want to use it, you must have a more unique algorithm capability. This is why we consulted MAXIEYE’s software expert for embedded software technology.
What does MAXIEYE embedded software solve?
To understand what the MAXIEYE embedded software is solving, you need to know what problems need to be solved first.
Because of cost, size, and power consumption, it is generally difficult to move GPUs or cluster training centers used in deep learning training to a car. Therefore, these algorithms and software can only be deployed into embedded SOC chips with deep learning inference acceleration that drastically reduces cost, size, and power consumption.
Also, because of cost, size, and power consumption, these embedded chips have limited resources in terms of computing power, memory bandwidth, and memory size. This scarcity of resources is particularly prominent in some L2 or L2+ auxiliary driving products between 1R1V and 5R5V.
Two reasons:
-
Deep learning algorithm engineers often cannot consider the restriction of such a large disparity in computing power from the training end to the deployment end, in order to achieve excellent results during training. This presents a great challenge to deep learning network deployment engineers of embedded chips to achieve the best results of deep learning networks.
-
As the auxiliary driving is related to vehicle control, the embedded software of auxiliary driving products requires strong stability and fast response capability.The stability of the product has two aspects: first, the software in the product cannot crash when achieving the claimed function and function indicators, even if the vehicle is continuously driven for several days; second, the function indicators of the product should remain consistent and stable.
The ability to respond quickly refers to the target recognition delay of sensors connected to the product’s cameras. If the embedded software in the product has a long delay in target recognition, there will be a delay in executing commands when the vehicle is moving too fast, which can easily lead to danger.
Therefore, it is an important challenge and reflection of the embedded capability of ADAS products to maintain excellent algorithm performance, high stability, and quick response.
The reason why the main manufacturers choose the TDA4 VM chip is that:
-
TI has absolute research and development experience among traditional suppliers of automotive-grade chips, making it relatively easy to use.
-
The evaluation period of automotive manufacturers is relatively long, and few small-caliber chips with GPUs in the early evaluation of automotive-grade chips, rendering the image stitching and rendering required for the surround view parking function need the effect of a GPU. TI is able to meet these demands.
-
TDA4 VM is a multicore heterogeneous chip, with a built-in ISP and on-chip physical isolation that can also synchronize the two cores of an MCU, reducing the cost of external ISPs and a dedicated ASIL D MCU for vehicle standards on system hardware, as well as solving the problem of MCU supply in recent years.
Although there are obvious advantages to the TDA4 VM, there are still some shortcomings. In addition to the actual low computing power mentioned earlier, we consulted the person in charge of autonomous driving at a new car manufacturer for information on other aspects.
In terms of deploying deep learning algorithms
The algorithmic leader of a certain car company told me:
TI’s officially provided deep learning inference acceleration tool TIDL has insufficient support for deep learning operators and only supports more than 20 basic operators, many of which have significant usage restrictions, such as softmax, sigmoid, Inner Product, etc., and only support one-dimensional input and output.
Moreover, operators such as reshape and scale are not supported for independent use in the network, which limits the network structure and model design. This makes it even more unfriendly for our deep learning algorithm engineers’ well-designed private operators.
Because we have evaluated many embedded ADAS chips with deep learning inference acceleration capabilities in China and abroad, we have found that TDA4 VM’s support for some mainstream features of deep learning inference acceleration at the embedded end is not very sophisticated and does not provide support for deep learning sparsity, relying entirely on hard calculation.The inference efficiency of chips with better support for random or structured sparse networks can be several times higher than that of pure computation, when the network sparsity is well handled.
However, they do not perform well in batch processing or mixed precision quantization and often encounter errors.
SDK
TDA4 VM uses a software architecture based on Open VX, which adopts a node-based directed acyclic graph (DAG) processing method. This method is widely used in many ADAS software, especially in large chips with multiple cores and coprocessors like TDA4 VM, where each core has its own operating system, some running on RTOS, some on Linux or QNX, and even AUTOSAR system running on some MCUs. Open VX architecture can mask hardware operation details and enable software engineers to focus more on their business logic.
However, there is always something that can not be done. We found that in the Open VX implementation of TDA4 VM, the communication delay between nodes is often large, especially when crossing cores, it can be as high as 3-4 milliseconds.
Here’s a small piece of knowledge: the frame rate of the camera is 30 frames per second, which means the processing time of each frame of the image is only 33 milliseconds. If the algorithm for an image cannot be processed within 33 milliseconds, the next frame comes, and the system will crash if this goes on for a while. Therefore, if the communication between the nodes takes up 10% of this 33 milliseconds, it is not worth it.
Long delay of perception processing chain
The deep learning perception processing of a camera in ADAS products is composed of the following steps: ISP processing of images, distortion correction processing, clipping scaling stitching processing, deep learning network input padding processing, deep learning network inference, deep learning network post-processing, perception target tracking processing, multi-sensor perception target fusion, planning controlling vehicle operation.
TDA4 VM is a multi-core heterogeneous chip, and its ISP for perception of images, distortion correction, clipping and scaling pre-processing all have specialized hardware coprocessors. However, facing varying demands of numerous deep learning networks, the capabilities of each coprocessor are still limited, causing long processing delays for each step.
In other words, the limited processing capabilities of coprocessors in TDA4 VM lead to a long delay in the perception processing chain, which poses a huge challenge for single TDA4 integration products. When we threw these problems to MAXIEYE’s software director, Mr. Zheng Chaosong, his response was straightforward and surprised me.
Zheng Chaohui said that these problems do exist, and it is not simply a matter of proposing one or two methods that can bring results, but rather a systematic methodology is needed. It approaches the root cause of these problems with an attitude of “seek truth from facts,” just as Hua Luogeng said: “Be good at retreating, retreat enough to return to the most primitive place without losing importance,” and then uses “innovation” to solve the problem.
MAXIEYE’s approach is that on top of the software framework, for some important cross-core communication, they abandoned the node-based communication mode of Open VX based on TDA4 VM setting.
They found a cross-core communication hardware mechanism in the intermediate layer below the application layer and wrote their own C/S communication method. This processing method can make multiple deep learning networks and their post-processing highly parallel, greatly improving efficiency and reducing the processing delay of perceptual goals.
What is important is that they also independently developed the SOA middleware.
According to Zheng Chaohui, “We not only did the system’s application layer, but also independently developed some middleware. This middleware has been verified on the Ambarella platform for two generations and several hundred thousand sets of products. It has a large transmission bandwidth, efficiency, and stability. Its collaborative mechanism based on subscription and distribution makes the development of various algorithms and software engineers loosely coupled, thereby ensuring that the integration and iterative updates of the entire system become fast and efficient.”
The paragraph above may be difficult to understand. Here is a simple explanation:
In the entire solution of the autonomous driving system, the relationship between software and hardware is that the chip manufacturer provides hardware and the basic kernel running on the hardware. The autonomous driving system is usually the highest layer of the application layer. But there is actually a bridge between the application layer and the kernel layer, which is middleware. With middleware, this chip becomes a universal chip that anyone can use. Host manufacturers or solution companies only need to do application layer development.
In addition to being a bridge connecting software and hardware, middleware is also the center of data scheduling, computing power allocation, and cross-core communication. The data from the entire sensor needs to pass through the middleware layer. If a third-party middleware is used, the data scheduling may not be so efficient and accurate.
MAXIEYE’s advantage is that they have developed an underlying operating system level that can perform real-time scheduling and routing of their own data. The result is that not only can the computing power be effectively utilized, but targeted training can also be performed.
Conclusion
From a market perspective, the real competition of the “one-stop parking solution” has just begun. DJI has already taken the lead in mass production, and MAXIEYE is closely following. Therefore, both have certain first-mover advantages. The difference is that MAXIEYE is based on a single TDA4 VM and has onlined the L2++ ability, while DJI still uses dual TDA4 VM.
At this level of solution, it greatly tests the control of costs and the maintenance of performance by the manufacturers.Therefore, whoever can control costs and provide a system that meets functional requirements, while having the ability to be mass-produced, will have the potential to lead over competitors. MAXIEYE and DJI have this potential.
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.