Sun Xun’s “Apollo-Lite: Pure Visual Intelligent Driving Technology”
There are two different paths for the development of high-level autonomous driving:
-
Whole-vehicle service represented by Waymo. This approach closely approximates L4-level autonomous driving within limited zones, relying heavily on LiDAR and requiring relatively heavy dependence on high-precision maps. Its advantages are rapid prototyping of systems and relatively low reliance on data and professional skills. Its disadvantages are cost and scalability.
-
Tesla’s vision-dominant intelligent driving solution, which relies on data accumulation and progresses gradually from advanced driver assistance to autonomous driving. It mainly uses cameras as sensors and has light dependence on maps. Its advantages are cost and scalability, while its disadvantage lies in its difficulty, as pure visual autonomous driving is inherently difficult.
Based on recognition and judgment of sensors, LiDAR and cameras are not mutually exclusive or subordinate, but each plays an irreplaceable role in high-level autonomous driving sensor selection.
For Baidu’s LiDAR, our attitude is to choose to do what the market demands and embrace partners. As for cameras, they naturally have a large amount of image information, and mining this information is difficult with stringent demands for talent, algorithms, and data.
Baidu will continue to invest in vision-based intelligent driving technology and aim to climb the peak to support the implementation of intelligent driving products by reserving low-cost, scalable intelligent driving technology.
Technical progress:
- Visual perception — environmental modeling
Modeling involves elements in the environment and perceiving obstacles in the environment to achieve full 360-degree 3D obstacle estimation and special obstacle detection redundancy mechanisms. Scene semantics involves defining traffic lights, lane lines, positioning elements, and drivable areas. Scene geometry modeling involves understanding the road surface model and road structure.
- Visual perception — technical challenges
- The computational cost is very high. The image data is 1080P, and the input volume of 10 cameras per second alone is 1GB to process, compared to the large input of scanning points for LiDAR data, requiring a highly parallel and efficient computing framework.
- There is a natural problem of passive distance measurement when obtaining 3D obstacles from images, which is a classic problem in computer vision with solution difficulty.
- Data scale. Due to the ambiguity of visual features, learning requires a very large amount of data training scale for different perspectives, lighting, textures, and colors to create new samples.
Wang Yang’s “ACU: A Soft and Hard, Mass-Produced Autonomous Driving Solution”
Autonomous driving keeps soaring in computing power, hardware, and specification requirements. Mass-produced solutions now boast processing power exceeding thousands of TOPS. Looking towards the robotaxi technology, the number of sensors and computing units for new platforms has grown exponentially.
In this context, it is inevitable to ask a question: what is the boundary of computing power? Can good products be produced when computing power is inadequate?In 1983, two groundbreaking products were introduced: the Nintendo Entertainment System and Apple IIe, which was the star product during Steve Jobs’ first tenure at Apple. The former cost a little over $100 while the latter cost over $1,000 in terms of game performance. Why is it that the cheaper product is actually more effective? The answer lies in heterogeneous computing architecture.
Recently, DSA (Dynamic Site Accelerator) has been gaining popularity. It focuses on accelerating computing related to various fields, including the aforementioned architecture.
Returning to autonomous driving, to accelerate the algorithm of autonomous driving and achieve the ultimate user experience at the lowest possible cost, it is necessary to have a clear and precise classification of the algorithm. The author has divided it into several categories:
-
Neural networks and deep learning are the most important.
-
Tasks related to image processing, such as ISP image quality, cropping and scaling, stitching and fusion, and traditional GPO functions.
-
Traditional computer vision processing. Although traditional SLAM is not a function of deep learning, it is still an algorithm.
-
Other sections that cannot be subdivided too much, all of which depend on mathematical operations and matrices.
The ACU product roadmap includes five generations of products divided into three platforms: Wu Ren, Si Xi, and San Xian.
-
The first generation product Zu5 has a computing power of 1.5 TOPS and can handle parking operations in the computing field.
-
The second generation platform with a single TDA4 has a computing power of 8 TOPS and can handle parking-related applications and simple ADAS functions.
-
The third generation Si Xi Plus has double TDA4 VM with computing power of 16 TOPS and can handle high-speed ANP computing. Each generation adds new features on top of the previous generation. In addition to parking, the third generation can also handle high-speed ADAS parking.
-
The fourth generation Orin-X platform has city driving functions in addition to high-speed driving capabilities. However, due to computing limitations, some extreme experiences need to be implemented on the fifth-generation platform with 2*Orin-X and 500 TOPS of computing power. This generation is called the “Ultimate Version” and will be equipped with the highest level of sensors to achieve full autonomous driving capability.
The core highlights of the “Innovative Car Assistant Powered by Xiaodu” mini program solution are:
- More flexible.# OTA Updates
All product updates are OTA-free, which is a very high cost for car companies. It is often said that cars need to be updated frequently, but only Tesla truly achieves this, while other mainstream car companies are few and far between. This may be due to a series of strict procedures, standards, safety requirements, and a series of channels that connect the delivery team, production and research team, and sales team. Only a handful of car companies have achieved OTA updates more than twice a year.
Advantages of Small Programs in Cars
- More Responsive
Small program services are essentially products that need to iterate quickly around user needs, and are always cloud-based.
- Lightweight
It does not affect the memory of the user’s car machine. All services are based on the cloud and do not take up large spaces like applications that can occupy several hundred megabytes. For example, installing “Onmyoji” in a car costs 2G.
- Faster
The overall small program is controlled by the cloud-based service ecosystem, and the front-end display ecosystem is controlled by the cloud, eliminating the need for adaptation and updates on the terminal, which perfectly solves the efficiency problems of multiple car models and screens.
- More Cost-Effective
Through a series of tool-based components, small program developers can quickly capture key signals and data from cars, and use packaged and encapsulated SDKs to create an AI-enabled CP/SP service that is more closely connected to the complete voice assistant.
Compared to its competitors, Baidu’s small program can achieve unified full-scenario voice interaction. In addition to traditional opening/closing, it can also realize “all-seeing is all-speaking” and one-step voice interaction with all small programs. It needs to establish voice capabilities with co-creation teams and third-party CP/SPs. Traditional CP/SPs do not have the ability to develop voice capabilities, but Baidu has accumulated many standardized semantic development spaces in the background, allowing developers to quickly achieve semantic one-step interaction.
Multimedia scenarios are traditional attention areas that urgently need to be disrupted. Many users think that listening to songs in the car is the main activity, but their behavior has not been deeply analyzed. For example, the FM radio in a car is a traditional one-way communication channel, where the user listens to what is played with very poor program selection, content, and interactivity. Users often need to interact with radio stations via WeChat or text messages, which is a very traditional way that needs to change.
This year, a core multimedia product upgrade was implemented, the “Xiaodu With You” APP for small program management. It hopes to make Xiaodu Assistant a personalized AI radio host for users in the listening scene. From when the user gets into the car until they get off, it recommends the programs the user loves based on a series of user behaviors, preferences, and you can also interact with other drivers who listen to the same program. This function can truly establish a social attribute. For example, you can ask what music all the stuck commuters on Wuhuan Houchangcun Road are listening to during their commute.Behind it all lies the core technology of Baidu, the aggregation engine. Features like Feeds, recommendations, good looking videos, and iQiyi, found in Baidu’s mobile app, all rely on the core capabilities of Baidu’s aggregation engine. The engine leverages user data across the entire network found on Baidu’s mobile app, facilitating storage, personalization, and recommendations.
Undoubtedly, voice is an important topic. As of now, Baidu has the most optimal and natural voice dialogue capabilities in a multimedia environment than any of its competitors, which is currently one of its core advantages.
Commercialization:
This is an unavoidable topic for anyone involved in the car networking industry. To verify B2B products with a To C product mindset and evaluation model, commercialization is approached through two options: one based on traffic monetization and the other based on value monetization. From an Internet perspective, traffic monetization is more common, with Taobao, Baidu, and Douyin, all fundamentally doing traffic monetization, advertising, and distribution.
However, the trend in the car industry leans towards value monetization, identifying value components that bring actual user experience improvements, including features such as Tesla, regular software services, and OTA upgrades. Whether through traffic monetization or value monetization, products are always driven by user needs rather than pure operation or traffic.
🔗Source: Apollo
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.