Algorithm porting and performance optimization for autonomous driving assistance based on QNX.

QNX Introduction and History

QNX was established in 1980 and is the first hard real-time operating system with a microkernel that complies with POSIX standards and is similar to UNIX. Over the past few decades, it has been widely used in fields such as automobiles, industrial automation, defense, aerospace, medical, nuclear power, and communications, providing middleware and basic software solutions with embedded operating systems at its core. In the late 1970s, QNX’s founders Gordon Bell and Dan Dodge wrote a system called Quick UNIX that could run on IBM PCs based on some ideas they had in college. It was later renamed QNX and officially released in 1980. After decades of evolution, QNX was acquired by Harman for $138 million in October 2004 and operated as a division of Harman for six years. In April 2010, BlackBerry acquired QNX from Harman for $200 million, along with Harman’s Wavemaker sound division based in Vancouver, which is the predecessor of the current QNX acoustic solution. After being bought by American Harman and then re-purchased by BlackBerry, the company, founded in Ottawa, Canada, returned to Canada after six years and became the most important component of BlackBerry’s core department of IoT technology solutions, responsible for important business content such as operating system automotive platform software, data security, IoT and cloud computing, and patent departments.

In the high-performance processing and functional safety cross-domain of the automotive field, QNX is the world’s largest commercial operating system provider. Since entering the automotive field in 1999, QNX has closely followed and led the development trend and hotspots of the automotive electronics embedded software field. It has deployed forward-looking strategic products on multiple important software platforms and provided advanced basic software and network security technology for global first-line automotive suppliers and manufacturers. It is widely used in subsystems of automotive electronics such as advanced driver assistance systems, virtualized intelligent digital cockpit systems, intelligent networked modules, intelligent gateways, high-performance computing platforms, and information entertainment systems. According to the statistics of well-known independent research company Strategy Analytics at the beginning of 2022, more than 215 million cars worldwide are equipped with BlackBerry QNX software, and an average of 20 million smart cars with BlackBerry QNX’s basic software enter the global market every year.So far, almost all automakers around the world have adopted software technology based on QNX operating system. 24 out of the top 25 global electric vehicle manufacturers use QNX software operating systems, for example, the Xpilot3.0 and Xpilot3.5 automatic assisted driving systems of XPeng Motors in China are based on QNX operating system, which is a hard real-time operating system with TUV Rheinland ISO26262 ASIL D functional safety. NETA S from Hozon Auto also uses QNX Hypervisor to build its new intelligent cockpit with a technological feel, and is equipped with QNX OS for Safety operating system in its full-stack self-developed TA PILOT 3.0 intelligent driving system, realizing intelligent assisted driving in various scenarios. Moreover, the third-generation high-end pure electric SUV, Zero Run C11, and intelligent pure electric bridge vehicle, C01, both use QNX Neutrino real-time operating system and QNX Hypervisor, aiming to bring more personalized and comfortable driving experience to Chinese consumers. In addition, the automatic assisted driving platform for the luxury pure electric hypercar, HiPhi Z, which is to be released by Human Horizons, uses NVIDIA OrinX chip and QNX embedded hard real-time operating system.

In 2016, Time magazine once praised QNX as “what Microsoft is to computers in the automotive industry”, which explains the fundamental software operating system position and deep coverage of QNX in the automotive field.

QNX Features

QNX is an embedded hard real-time microkernel operating system

It has the characteristics of hard real-time, microkernel, modularization, weak coupling, and distribution, and was designed based on SOA architecture from its inception in 1980, based on the Client-Server model, specifically manifested as:

  1. Hard real-time: Any switching time and interrupt latency are fast, and all task responses are deterministic behavior.

  2. Microkernel: Except for scheduling, process management, interrupts, and the core functionality of the operating system, everything else is in the user space, including drivers, protocol stacks, file systems, and functional modules.

  3. Modularization: Each functional unit of the operating system is modularized, memory-protected, and isolated from each other. They can be dynamically loaded or unloaded as needed, and communicates based on message mechanism, designed based on the Client-Server architecture.

  4. Weak coupling: Modules do not affect each other, and all operate in independent virtual address spaces.

  5. Distribution: A QNX system within a local area network can be regarded as a single QNX system from the user’s perspective, and its resources can be reused.

QNX is a UNIX-like operating system### QNX is a functional safety and information security operating system

QNX follows the highest level PSE54 standard of POSIX (Note: there are four levels of POSIX standards – PSE51, PSE52, PSE53, and PSE54. In the world of RTOS real-time operating systems, only the QNX operating system is based on the PSE54 standard because QNX was originally developed as a UNIX-like system according to the POSIX standard). Therefore, open-source applications and some open-source middleware can be seamlessly ported to the QNX system. QNX Microkernel and Process Manager make up the minimum system Procnto of QNX, while other modules such as driver programs, protocol stacks, file systems, and applications run as independent modules on the QNX system.

Other features of QNX

  1. QNX scheduling algorithms and policies

QNX has many scheduling algorithms, essentially based on a priority preemptive system. The thread priority of QNX is a number from 0 to 255, and the higher the number, the higher the priority. There are three basic scheduling policies in QNX, which can be used alone or in combination, including the Round Robin policy based on time slicing, the FIFO policy based on priority preemptive, and the Sporadic algorithm based on time budget. Meanwhile, QNX also provides the APS adaptive partition scheduling algorithm to ensure low-priority tasks can still be scheduled under a heavy CPU load and not “starved”.

  1. QNX IPC communication mechanism

In addition to supporting native IPC mechanisms such as massage passing and signal, QNX also provides POSIX standard IPC communication methods such as MessageQ, Pipeline, and Shared Memory for users to choose from in different application scenarios.

  1. QNX IDE integrated development environmentQNX provides the Eclipse-based Momentics IDE integrated development environment for users to perform code-level compilation and debugging of Ethernet-based Software GDB or systemic performance analysis, real-time graphical viewing of process resources, system logs, CPU utilization, memory usage, inter-process communication, and Coredump.

QNX’s Application in the Field of Autonomous Driving

Due to QNX’s real-time, deterministic behavior and functional safety characteristics, it meets the safety level requirements of functional safety ISO26262 ASIL D for autonomous driving. Therefore, due to the needs of domestic and foreign OEM projects, QNX is widely used in the field of autonomous driving as a basic software to carry various real-time and high-reliability applications. As chips and basic software become more and more integrated into a complete solution in the field of autonomous driving, QNX is also included in the mainstream overall basic software platform solutions of high-performance autonomous driving chips as a critical part provided to end-users.

Collaboration between NVIDIA and BlackBerry QNX

NVIDIA’s series of high-performance chips are widely used in the field of autonomous driving, such as Xavier, Orin, and Thor. NVIDIA, as a top supplier of overall solutions for autonomous driving chip platforms, provides a basic software platform with DriveOS as the core. As early as five years ago, NVIDIA chose QNX as its deep partner for cooperation. QNX, as the only RTOS cooperative partner for NVIDIA DriveOS’s functional safety ISO26262 ASIL D version, provides a one-stop solution for DriveOS with functional safety versions based on QNX, such as on the Xavier platform, where because the overall platform software needs to reach the ASIL D level, DriveOS only provides the QNX SafetyOS secure kernel version. NVIDIA attaches great importance to functional safety. BlackBerry QNX, as NVIDIA’s only RTOS operating system partner, is included in the Driver OS’s overall solution, and NVIDIA provides one-stop solutions and service support, with engineering support services unifiedly provided by NVIDIA.

Collaboration between Qualcomm and BlackBerry QNXAs a leading chip solution provider in IoT and mobile phone fields, Qualcomm occupies the vast majority of the high-end intelligent cockpit market in the automotive electronics industry. As part of Qualcomm’s Snapdragon cockpit chip overall solution, BlackBerry QNX is the only hypervisor partner that supports nearly 100 automotive electronics customers worldwide. In the field of advanced driver assistance systems, Qualcomm Snapdragon Ride has also been chosen by many leading global automakers, including Volkswagen, BMW, General Motors and Great Wall Motors. As the basic software base of Qualcomm’s Snapdragon Ride platform for autonomous driving, BlackBerry QNX provides a one-stop ISO 26262 ASIL D functional safety level software platform solution provided by Qualcomm.

Cooperation between Chinese autonomous driving chip companies and BlackBerry QNX

In recent years, high-performance domestic chips have emerged in large numbers, and more and more potential domestic companies have emerged in the field of advanced driver assistance systems. BlackBerry QNX has now adapted to chips such as Horizon Robotics J5, providing a one-stop overall solution provided by the chip companies. It is worth mentioning that many other leading domestic high-performance autonomous driving chip companies that attach great importance to functional safety will officially launch next year.

Problems Encountered by Chinese Autonomous Driving Platform Software

The field of autonomous driving has been very popular in recent years. Many domestic and foreign automakers have gradually developed and released L2+ features in mass production projects. When we look back on the rapid development in recent years, we can find that most of the talents in the field of autonomous driving come from Robotaxi, autonomous driving algorithm start-ups or university research institutions, especially in algorithmic talent. There is a significant feature that most of the projects in these companies are initially based on industrial computers + NVIDIA graphics cards (most using NVIDIA GPUs, and a few using AMD) + open-source operating systems + algorithms from open-source communities. In fact, this has nothing to do with the safety of automotive electronics. The only advantage is that it is quick and easy to demonstrate and to fundraise.

After these algorithm talents join automakers, they tend to use the development methods they are most familiar with to demonstrate results as soon as possible by using NVIDIA’s SOC + open-source operating system + algorithms from open-source communities. On the other hand, in the autonomous driving project, the automakers generally outsource the controller platform, which is the hardware and platform software, to external Tier1 companies, similar to a PC computer, while developing their own applications and algorithms.Generally, original equipment manufacturers (OEMs) in the automotive industry have platform teams responsible for integrating some drivers and middleware above the driver level. System teams are responsible for system design and coordination, while functional safety teams are responsible for overall functional safety. The algorithm team is responsible for developing and implementing algorithm applications. The question is, besides the pure algorithm team, most foreign OEMs have a professional algorithm embedded engineering team, which is responsible for optimizing and implementing algorithms in non-industrial embedded environments and real-time operating systems. This type of team needs to understand a bit of algorithm architecture, embedded software development, hardware characteristics, as well as have sufficient understanding of operating systems.

In many Chinese OEMs, such a team is not seen, and even such talent does not exist. Therefore, due to tight development cycles and lack of experience in developing embedded systems, many projects will develop in a way that is more similar to robotaxi, using processors similar to industrial control computers, GPUs in SOC, and open source operating systems plus various unoptimized open source algorithms. Recommendations from functional safety teams are often ignored. This is because meeting short production lead times and leading in China’s military competition among OEMs is the most important thing, something unimaginable for European and American OEMs. In this regard, there are many OEMs in China with sufficient talent reserves and responsible algorithm engineering implementation teams that are doing very well. We look forward to more OEMs paying attention to this issue in the near future, and to more industry talents filling this gap in China.

Example of QNX Algorithm Porting and Performance Optimization

QNX provides ADAS reference platform products, including Sensor Framework, networking, open source modules, third-party SDKs, and some reference designs. The Sensor Framework provides some basic libraries for ADAS.

Algorithm Porting

Most autonomous driving algorithms are open source. Since QNX is compliant with POSIX PSE54 standards and API is basically compatible, various open source algorithms can be easily ported to QNX platforms and compiled and run using QNX’s toolchain. However, even though the APIs are consistent, due to the characteristics of real-time operating systems, the behavior of the system will differ, requiring optimization and adjustment.### Sharing common QNX performance optimization techniques

  1. IPC optimization

QNX supports most common IPC methods in mainstream POSIX systems, and also has its unique native IPC method, Message-passing. In the design of automatic assisted driving solutions, some companies often copy architecture schemes that use UDS and DDS as software communication buses directly from Linux to QNX. Functionally, this cross-platform solution allows code reuse and has no difference in functionality. But from a performance perspective, due to the unique kernel architecture of QNX, this is not an efficient solution. Unlike Linux’s monolithic kernel architecture, QNX adopts a microkernel architecture for security and real-time performance. Most system services, such as network protocol stacks, run completely outside the kernel as services (Resource Manager). If UDS (Unix Domain Socket), which is a network-based service (strictly speaking, UDS does not need to go through the network protocol stack, but it also requires support from QNX’s network service io-pkt) is used for communication, then all datagrams need to be transferred through the network service, resulting in an additional IPC compared to direct communication, which causes waste of system resources. The recommended optimization solution is to use more efficient IPC methods. In general, message-passing is recommended for medium to small data transmission, and shared memory is recommended for exceptionally large data transmission. In addition, some open-source software also extensively use FIFO, PIPE, and other IPC methods. Although QNX supports this type of usage, we also recommend changing it to more efficient message passing method to reduce the cost of single IPC.

  1. Compilation option optimization.## QNX adopts the framework of GCC, and due to security considerations, its compiler versions update slower compared to the open source community. For example, SDP 7.0 uses GCC 5.4.0, SDP 7.1 uses GCC 8.3.0, and the upcoming SDP Moun will use GCC 11.X. Sometimes, it can be observed that QNX’s performance is lower than open source when running the same algorithm library, which is very likely due to differences in compiler versions or optimization options. This is because the default ARMv8 compiler optimization options on Linux systems are full-level, while QNX does not turn on ARMv8’s optimization options by default. Therefore, relevant compiler options need to be turned on during program compilation to obtain the best performance. Due to safety considerations, certain compilation options are not turned on by default during compilation and can cause performance issues.

3. Driver-level Optimization

For network/storage device drivers, based on previous experience, most performance issues are bottlenecked at the driver level. Especially with new hardware and drivers, attention should be paid to adapting to the QNX system service layer. The quality of drivers is often the most important performance factor besides hardware itself. We have encountered many scenarios where redundant waiting and busy waiting from the driver level eventually lead to wasted system function.

4. Network Protocol Stack Optimization

Besides network driver optimization, QNX’s network protocol stack, io-pkt, also provides abundant parameters that can achieve optimal performance based on specific application scenarios. Additionally, users who use QNX SDP 7.1 and later versions can use the latest version of the network protocol stack, io-sock, which significantly improves the utilization of multi-core CPUs and the processing capacity of large concurrent small packet data. Both protocol stacks have their own advantages. In fact, a large number of cases have shown that users have not reached the performance bottleneck of io-pkt, such as packet loss caused by insufficient socket buffer, and blocked transmission caused by insufficient memory pool allocation, which can be optimized through configuration and API.

5. System API Optimization

For example, memory allocation and memory copy, QNX provides jemalloc to provide additional memory leak means based on actual application scenarios, and provide more features. Jemalloc is more efficient than default malloc, especially for scenarios with a large number of threads and high concurrency.

6. User Interface Optimization### QNX Provides Low-level APIs with Differences in Their Own APIs

QNX provides low-level APIs with some subtle differences, especially in some of their own APIs, such as sendmsg() and sendmmsg(). Users are often more familiar with the former, which is used for sending packets through sockets, but the latter provides a message queue to improve overall throughput without increasing IPC. For example, mmap(), we provide some QNX-specific flags to handle different memory mapping scenarios, such as the combination of MAP_ANON and MAP_PHYS, representing the application for physically contiguous memory regions, while MAP_LAZY further delays the allocation of memory. Understanding and familiarizing oneself with the parameter configuration of each interface and the application scenarios of similarly named interfaces will be of great help to development. Our online documentation has a dedicated chapter that provides a complete and detailed introduction to the parameters and relevant usage of each interface.

QNX Provides Momentics IDE Environment for Performance Analysis

QNX provides Momentics IDE environment for performing performance analysis of algorithms, such as memory leak, application profile, and kernel trace analysis. Event-by-event tracing can be provided during the capture period to obtain event and interrupt response for each time point and provide optimization suggestions. We also support custom kernel events to allow users to have precise understanding of the running condition of code fragments.

QNX Provides Onboard Debugging and Supports GDB for Real-time Saving of Application Call Stacks

Finally, even as an ISO26262 ASIL-D safety-certified hard real-time operating system, QNX is not behind macro-kernel systems in terms of system performance. As long as it is used and optimized reasonably, its performance is equally excellent and occupies lower system resources. QNX has rich experience in algorithm migration and optimization, and provides a series of means and tools to locate bottlenecks in algorithm performance.

The above are some experience sharing. More tips and optimization methods for using BlackBerry QNX will be updated one after another, please stay tuned.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.