An Analysis of Pony.ai’s L4 Autonomous Driving Vehicle Central On-board Computing Platform

Author: Pony.ai

Introduction:

When it comes to autonomous vehicle hardware, what usually comes to mind are the sensors installed around the car, especially the Velodyne 64 “full family bucket” which was once the symbol of L4 autonomous driving development vehicles. Although the autonomous driving on-board computing platform is hidden inside the vehicle, and few L4 autonomous driving solutions mention it, it performs almost all of the data processing and algorithm execution tasks, and is also the central computing unit of the PonyAlpha X hardware system. We have some unique thoughts on this.

Joint optimization of software and hardware

Currently, the TOPS (Trillion Operations Per Second) is commonly used in the computing hardware industry to measure compute performance. However, there is a trend that believes higher TOPS can better meet the needs of L4 compute, and some solutions even claim that achieving a certain TOPS can realize L4 autonomous driving. We believe that TOPS is more like a forced induction of computing hardware vendors for L4 compute needs. In fact, the overall system capability of L4 is not just TOPS, but more importantly, the joint optimization of software and hardware according to factors such as latency, data throughput, computing efficiency (perf/watt), computing certainty, accuracy, system stability, etc.

Figure 1: Multiple factors to consider in L4 on-board computing platform

In the development of the PonyAlpha X system, the Pony.ai software and hardware teams worked together and created a highly customized heterogeneous computing system that fully meets the needs of Pony.ai’s application software and also balances high-performance, high-efficiency, and high-reliability computations in L4 autonomous driving central on-board computing platform.

Figure 2: Appearance of the Pony.ai central on-board computing platform

Customized heterogeneous computing architecture

Although powerful CPUs and GPUs are still the main processors for L4 autonomous driving computations, specific heterogeneous systems can greatly reduce latency, improve performance, safety, and computing efficiency.

Data preprocessing and computing offloading

By introducing FPGAs, the computing platform preprocesses the massive, high-throughput, irregular sensor data, provides functions such as time synchronization, calibration, data compression, checksums, and repackaging, which increased the overall sensor data stream latency by 6 times, and reduced CPU/GPU usage by 20%. At the same time, the FPGA provides a hardware abstraction layer, which ensures that the upgrade and update of sensors will not affect the overall system functionality.

Safety IslandTranslate the Chinese Markdown text below into English Markdown text, preserving the HTML tags inside the Markdown, and only outputting corrections and improvements, without providing explanations:

Safety has always been the key to high-performance autonomous driving hardware. By introducing a car-level MCU as a safety island, the health status of the entire computing platform is comprehensively monitored, and real-time fault detection is performed to output error codes, thereby triggering a malfunction safety system.

Communication Overhead

Heterogeneous computing often faces the additional cost of inter-chip communication. By using the industry’s most advanced IO communication protocol, the computing platform has tripled the bandwidth for chip-to-chip data exchange, making parallel computing with multiple chips possible.

Computing Efficiency

The charm of heterogeneous computing lies in the significant improvement of overall system computing efficiency by properly assigning different computing tasks to the most suitable computing chips. After software and hardware optimization, the computing platform has an average power consumption of only 1/3 of that of equivalent data center servers, saving energy while significantly improving heat dissipation pressure and reliability.

Figure 3: Mechanical schematic diagram of the Pony.ai central on-board computing platform

Balancing Performance and Reliability

In automotive electronic products, higher performance and greater power consumption typically reduce reliability. The engineering charm of the Pony.ai central on-board computing platform lies in the balance between performance and reliability.

High Performance

In order to achieve hardware and software development decoupling, the computing platform’s design philosophy for performance not only meets current development requirements but also anticipates future L4 autonomous driving software development needs over the next three years. In addition to the heterogeneous computing mentioned earlier, we seek the best balance point between chip and system performance and perf/watt through joint software and hardware optimization, rather than just focusing on absolute performance. The figure below shows our multidimensional selection of the most efficient computing solution.

Figure 4: Multi-dimensional chart for selecting the computing scheme of the central on-board computing platform

High Reliability

One of our methods to ensure automotive-level reliability is to leave enough design margin. Only a sufficient design buffer can ensure reliability in harsh automotive environments, such as signal/power integrity, vibration, and temperature factors. For example, we designed a power supply system that consumes four times the normal operating power to ensure that any peak power consumption does not cause system failure.

Another approach to reliability is to provide a comfortable working area as much as possible for high-performance computing systems. For example, even though automotive electronics need to work in an environment temperature range of -40°C to 105°C, we provide precise liquid heating and heat dissipation design, enabling even consumer-grade chips to operate reliably in automotive environments. For example, by reasonable buffer design, even the originally fragile devices can be free from damage under 25g impact.Translate the following Chinese Markdown text into English Markdown text, in a professional way, preserving HTML tags inside Markdown and returning only the correct and improved version, without explanations.

Through equivalent redundancy and degrading redundancy, as well as real-time health monitoring systems, even if a system module malfunctions, such as a redundant power input failing, the system can still operate normally or degrade.

Figure 5: Central Car-mounted Computing Platform Liquid Cooling External Loop Design

System Verification

Following the classic hardware development V model, multiple-dimensional system verifications were conducted after design implementation.

First, at the signal level, the power of critical signals was measured under different temperature and voltage conditions according to industrial standards to verify signal margin compliance with the design.

Second, based on the DVT testing standard defined by Pony.ai, environmental and pressure testing was conducted to ensure normal system operation under various conditions such as temperature, vibration, and humidity, and no damage to the system occurred under conditions such as power loss, short circuit, and collision.

Subsequently, at the system level, we developed the HIL simulation platform so that the computing platform system can be verified 7×24 hours without interruption in the offline mode, to achieve MTBF greater than 100,000 hours in statistical significance.

Finally, we conducted extensive actual road tests to ensure the reliability of integration with the overall L4 autonomous driving hardware system.

Figure 6: Central Car-mounted Computing Platform Z-direction Vibration Testing

Mass Production Progression

The Pony.ai central car-mounted computing platform is not just an advanced research project, but a goal to support mass production of L4 autonomous driving hardware:

Every design detail fully considers DFM and DFA, making batch machining and assembly possible and ensuring mass production-level yield;

The computing platform is very easy to deploy and maintain, with a plug-and-play overall design and modular design for easy maintenance; finally, the computing platform has a clear roadmap for automotive regulations and can achieve automotive-grade L4 computing platform in 2023.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.