Author: Sun Xiaoshu

For Baidu, which has started to layout autonomous driving since 2013, intelligent vehicles are a must-win territory.

After missing the mobile Internet era, Baidu will not allow itself to miss artificial intelligence. One of the most important scenes for the commercial development of artificial intelligence is travel.

To compete in the intelligent vehicle market, Baidu’s trump card is the Apollo autonomous driving platform launched in 2017, as well as the “Apollo Intelligent Vehicle Solution” built on this platform.

Apollo Lite

Before that, Baidu’s focus on commercialized landing in the autonomous driving field was on AVP (Automated Valet Parking), and Baidu has established cooperation with WM Motor, GAC, Great Wall and other manufacturers on AVP.

Among them, WM Motor’s W6, launched in April 2021, has already carried HAVP learning-based parking function, and will upgrade to PAVP high-precision map parking function through OTA in Q4 this year to realize unmanned driving parking in non-fixed scenarios.

In the second half of this year, Baidu’s key project is ANP (Apollo Navigation Pilot). Assisted navigation is a hot technology in recent years, and there are responsive products from major manufacturers, such as Tesla’s NOA, NIO’s NOP, and XPeng’s NGP.

What sets Baidu’s ANP apart is that Baidu has downgraded the L4 level autonomous driving capability accumulated by Apollo to L2+ level autonomous driving vehicles. In addition to supporting high-speed and urban loop scenes, it can also be used on urban roads. Moreover, Baidu ANP adopts a pure visual perception solution.

At the second Baidu Apollo Eco Summit in December last year, Baidu released the “Apollo Intelligent Vehicle Solution”, one of the highlights of which is the only L4 level pure visual autonomous driving technology in China, Apollo Lite, based on the hardware configuration of 12 cameras, 12 millimeter wave radars and 4 corner radars.

Looking back, the Apollo Lite solution was already publicized in 2019, almost at the same time, the autonomous driving perception aspect is divided into two schools: LiDAR vs. pure vision.Markdown English text:

The LiDAR-based approach, represented by Waymo, Baidu IDG, and other early autonomous driving systems, heavily relies on LiDAR and high-precision maps.

The advantages of this approach are that it can approach L4 autonomous driving in limited areas, quickly build system prototypes, and have a relatively weak reliance on data and expertise. The disadvantages are that the cost is relatively high and the scalability is relatively poor.

The pure visual perception approach mainly relies on cameras, with a lighter reliance on sensors and high-precision maps. Based on data accumulation, it gradually moves from assisted driving to autonomous driving.

Tesla announced in May 2021 that the Model 3/Y manufactured in North America will no longer be equipped with millimeter-wave radar, further evolving towards purely visual perception.

The advantages of the visual approach are lower cost and greater scalability, while the disadvantages are higher difficulty, mainly in:

High computational complexity. With 10 cameras and 1080P images, processing alone can generate 1 GB of data, requiring a highly parallel and efficient computing framework.

Difficulty in 2D to 3D mapping. The 3D obstacles obtained from images have inherent passive ranging problems, which are classic problems in computer vision and sign the number of problems.

Large data scale. Apparent information is very ambiguous, with different perspectives, lighting, textures, and colors being completely new samples, requiring a very large data training scale.

Currently, there are only three major vendors persisting in visual perception schemes: Baidu, Mobileye, and Tesla. Among them, Tesla only does pure visual perception, while Baidu and Mobileye run two parallel routes.

Baidu’s view on the sensor selection for high-level autonomous driving is that LiDAR and cameras are not exclusive or subordinate and have their own irreplaceable functions.

Regarding the emergence of Baidu’s pure visual perception approach, Wang Liang, the chief research and development architect of Baidu’s intelligent driving business group, explained that Baidu opened up the pure visual perception route to balance the development of the two approaches and avoid engineers’ overuse of LiDAR and other sensors due to time pressure.

Autonomous Driving Implementation量产落地是自动驾驶成功的必要环节。量产意味着自动驾驶技术可以上路,大量数据的回收也将帮助支持 Apollo 平台的迭代升级,形成一个良好的闭环。

然而,实现自动驾驶的落地是一个复杂的体系问题,这牵涉到多方方面的协调和整合。我们可以将这多方面视为不同的子集,它们包括:

  • 自动驾驶的商业公司关注的是自动驾驶算法,以确保自动驾驶系统的优良表现,包括感知、规划和控制等全栈算法。

  • 传统汽车厂商认为自动驾驶是一个集成到整车电子电气结构中的控制器,与线控有关。

  • 第一级供应商认为自动驾驶是一个量产控制器,要求安全可靠。

  • 芯片供应商认为自动驾驶对高性能芯片的需求增加,但实际情况是芯片标称的性能和自动驾驶功能之间存在差距。

实现自动驾驶的量产落地实际上是这些子集的总体集合。在百度,Apollo的实体 ACU( Apollo Computing Unit)是负责整合和协调各种子集,以实现自动驾驶大规模生产。

以算力利用为例,百度 ACU 将自动驾驶算法进行清晰分类,依次按重要程度为:

  1. 神经网络和深度学习;
  2. 与图像相关的处理,例如 ISP 画质图例、裁剪缩放、拼接融合等;
  3. 传统计算机视觉处理和不是深度学习但仍然是重要算法的 SLAM;
  4. 其他难以划分的功能,例如基于数学的行数和矩阵等。

自动驾驶公司对算法进行细分,可让特定的计算单元进行特别运算,从而提高算力利用率,减轻芯片负担,推动车规量产的实现。

百度 ACU 的产品路线规划有三个平台:Zu5、TDA4 和 Orin-X,进一步划分为五代:

  • 第一代:Zu5,算力为 1.5 TOPS,主要用于停车相关功能的运算;

  • 第二代:单个 TDA4,8 TOPS,实现基本的 ADAS 功能;

  • 第三代:Orin-X,自动驾驶算法的核心驱动单元,提供 254 TOPS 的算力,以支持自动驾驶技术的发展。The third generation: Double TDA4, Double VM, 16 TOPS, to achieve ANP function in high-speed scenarios.

The fourth generation: Orin-X, 254 TOPS, to achieve automatic driving function in high-speed and partial urban working conditions.

The fifth generation: Double Orin-X, 500 TOPS, to achieve full automatic driving function.

Intelligent Driving Cloud

At the beginning of Baidu’s entry into the field of autonomous driving, it started the construction of the underlying basic cloud platform.

As of now, Baidu’s autonomous driving cloud has accumulated millions of kilometers of test mileage data, which will be categorized into a scenario library system. These collected data and the constructed evaluation system will help Apollo measure and adjust the resource investment of autonomous driving in subsequent research and development iterations.

Based on the overall simulation virtual cloud platform, the Baidu Intelligent Driving Cloud platform can also achieve daily simulation tests of millions of kilometers. Based on Baidu’s autonomous driving cloud business platform, Apollo can achieve weekly-level autonomous driving research and development iteration through OTA onboarding, and achieve weekly-level version onboarding and release.

Regarding the overall management of the underlying autonomous driving cloud computing power, Baidu’s design includes four parts:

  1. Overall integration of perception, 2D, 3D algorithms, and AutoML on the underlying cloud platform;

  2. Heterogeneous computing resources including CPUs, GPUs, and FPGAs;

  3. Aggregate massive data through the data platform and support data mining;

  4. Provide semantic-level and case-level scenario storage.

Through the Baidu Intelligent Driving Cloud platform, Baidu can have the ability of data loop, effect loop, and business loop, and ultimately reflect in the ability of fast research and development, fast verification, and fast iteration.

Intelligent Cockpit

In addition to intelligent driving, another main application scenario of automotive intelligence is the intelligent cockpit, whose impact on the in-car user experience is becoming increasingly prominent, and Baidu hopes to enhance the user experience in terms of humanity and connection breadth.

In terms of humanity, Baidu hopes to make the connection between users and cars more natural by launching a personal virtual assistant role. Users can upload photos to directly generate their own 3D virtual image, which can be themselves, their loved ones, children, or even pets. Generating 3D virtual images from 2D image data is very similar to Apple and Facebook’s approach.

At the same time, Baidu also allows users to record 20 voice clips to create personalized voice packages, further enriching the image of virtual assistants.The Baidu Intelligent Cockpit will upgrade from single-mode speech text understanding to multi-mode understanding combining image and speech. Utilizing the accumulation of search business and Baidu’s products such as Baike, Zhidao, Video, and WenDa, Baidu aims to create an encyclopedia-style voice assistant.

For example, when a user with a child is driving past a uniquely-styled house, the child can ask the voice assistant, “What is the name of that pretty house?” Baidu’s voice assistant will answer by connecting to the cloud, a feature that other voice assistants from other vendors cannot achieve.

Furthermore, Baidu’s voice assistant upgrades are deployed and distributed through the cloud, not relying on OTA upgrades from the devices.

With breakthroughs in voiceprint recognition, online and offline modes, multi-language, multi-scenario full-duplex conversation, emotional recognition, and emotional synthesis, the Baidu Intelligent Cockpit will also possess semantic rejection capabilities to distinguish between relevant and irrelevant instructions in the vehicle.

In conclusion, Baidu is a pioneer in the field of autonomous driving, with over 13 million test miles in L4 testing, conducting autonomous driving tests in 27 cities across China, and operating shared autonomous vehicles in four cities, achieving commercial operations for driverless cars in Beijing.

Baidu’s competence is unquestionable in intelligent driving, cloud computing, and intelligent cockpit. For Baidu, what is most crucial is to mass-produce these capabilities and promote the rapid iteration of the Apollo autonomous driving solution.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.