Is a High-end Intelligent Driving Boom Coming?

Advanced intelligent driving assistance is on the verge of an explosive breakthrough.

According to data released by the Ministry of Industry and Information Technology, in 2022, the penetration rate of new passenger cars with L2 advanced intelligent driving assistance reached 34.9%; In the first half of this year, the proportion of new passenger car sales with advanced intelligent driving functions reached 42.4%.

What’s more, by 2025, the penetration rate of advanced intelligent driving assistance for passenger cars is expected to reach 70%.

Facing such a major market trend, choosing between technology and cost has become a key issue for autonomous driving companies when mass production is being implemented. Hence, Momenta, determined to secure an important position in the market for advanced intelligent driving assistance for passenger cars, made a move at the 9th AI Day.

Three HPilot Products Emphasizing Superior Cost-Performance Ratio

Compared with previous editions, one significant difference in this Momenta AI Day is its emphasis not only on breakthroughs in AI algorithms, data and computing power on the technical solution level for autonomous driving but also the commercial implementation aspect.

A most typical and much-anticipated development is Momenta’s heavyweight launch of three second-generation HPilot products, namely Momenta HP170, Momenta HP370, and Momenta HP570, aimed respectively at the low, mid, and high-tier pricing levels for mass production needs.

Of course, a common tag for these three products is “Superior Cost-Performance Ratio”. Here are some specifics:

Momenta HP170 is a high-speed Mapless NOH product solution at the 3000 yuan level, which can achieve integrated driving parking smart driving. With a hardware configuration of 5TOPS, its sensor package includes 1 front-view camera, 4 fisheye cameras, 2 rear corner radars, 12 ultrasonic radars, with an optional installation of 1 front-view radar and 2 front corner radars.

In terms of scenarios, Momenta HP170 can achieve mapless NOH on highways and urban expressways, short-range memory parking functions and has received the high safety standard certification of E-NCAP 5 Star AEB.

Momenta HP370 is a 5000 yuan level urban memory driving and parking product solution, which can achieve integrated driving parking smart driving. With a hardware configuration of 32TOPS, its sensor package includes 2 front-view cameras, 2 side-view cameras, 1 rear-view camera, 4 fisheye cameras, 1 front radar, 2 rear corner radars, 12 ultrasonic radars, with an optional installation of 2 front corner radars.

In terms of scenarios, Momenta HP370 can realize high-speed and urban express driving, as well as urban memory driving, teaching-free memory parking, intelligent obstacle avoidance functions. On this product’s positioning, Chairman Zhang Kai of Momenta commented, “Momenta’s memory driving can be seen as the minimal set of Momenta’s Urban NOH, a strong addition to Urban NOH.”The Halo HP570 is an RMB 8,000 urban scene NOH product solution with future deployment in 100+ cities. In terms of hardware configuration, it offers a choice between 72TOPS and 100TOPS chips, equipped standard with 2 front cameras, 4 side cameras, 1 rear camera, 4 fisheye cameras, 1 front radar, 12 ultrasonic radars, and options to add 1 LiDAR.

The HP570 can realize NOH in urban scenes, assisted parking, intelligent obstacle avoidance, and multi-level memory parking. About the Halo HP570, Zhang Kai stressed, “The historical mission of the HP570 platform is to create the most cost-effective high-end urban intelligent driving product in the industry.”

Overall, the Halo HP570 represents the latest achievement of Halo’s intelligent driving solution for urban NOH. On the other hand, the Halo HP370 and HP170 are more inclined towards mid to low-end passenger cars based on market insight, catering to different user needs.

For instance, the launch of the Halo HP170 came about as Halo believes that by 2025, the split hardware design of parking will take over, and domain controller solutions will become the mainstream.

During the on-site interview, despite launching the HP570 solution based on 72TOPS and 100TOPS computing power, Halo assured us that this would not affect the previously announced solution based on 360TOPS, the HP550. Scheduled production of the HP570 will only start in Q4 of next year.

Be it in terms of computational power or pricing, these three products do have a strong “cost-effective” label attached. Halo’s straightforward response is to “lower the price while enhancing the performance”.

One word, roll.

How Can Large Models Aid Autonomous Driving?

In addition to facilitating mass production and market planning breakthroughs, Halo is also constantly advancing the exploration and breakthrough of AI technology in autonomous driving, which were highlighted at this Halo AI Day.

The sole focus: large models.

At the event, Halo’s CEO Gu Weihao delivered a keynote entitled “Autonomous Driving 3.0 Era: Large Models Will Reshape the Tech Roadmap for Intelligent Cars”. Here, “Autonomous Driving 3.0” refers to data-driven development characterized by big data and large models.

Compared to “Autonomous Driving 2.0”, the main difference in the technological framework of Halo’s “Autonomous Driving 3.0” is “large models”. Here are the specifics:

  • Firstly, autonomous driving will break through the capabilities of big perception and cognition models in the cloud, gradually unify various small models on the vehicle end into perception and cognition models, and AI modelize the control module.* Subsequently, the evolution track of the in-car autonomous driving system not only progressively modelizes the entire links, but also morphs towards larger models, implying a gradual merging of smaller models into a larger whole.
  • Furthermore, enhanced perceptual capabilities can be achieved in the in-car system through methods like pruning and distillation of the cloud big-model. In areas with robust communication infrastructures, the big model may even control vehicles remotely through a vehicle-cloud collaboration arrangement.
  • Finally, both in-car and cloud-based environments are envisioned to host end-to-end automatic driving large models in the future.

So, based on this grand “EVERUS” concept, how, after the rollout of DriveGPT in April, did Millimeter manage to incorporate large-model operations within the DriveGPT framework?

We’ll dissect this primarily from perceptual and cognitive points of view.

Let’s explore the perceptual phase.

Initially, DriveGPT attains learning from the real physical world by constructing large-scale visual perception models. This helps to model the real world into a 3D spatial construct, subsequently extended with the time element resulting in a 4D vector space. Then, Millimeter superimposes this 4D perception of the physical world with the integration of an open-source multi-modality large model that offers a more generic semantic perception. This enables the fusion of text, images, and video data, ensuring the alignment of the 4D vector space with the semantic space.

Millimeter’s breakthrough in the perceptual stage primarily materializes on two frontlines:

The first concerns the continuous evolution of the vision large model or the CV Backbone. Currently, employing Transformer large model structures and large-scale data, the self-supervised training paradigm exercises video generation methodologies. This facilitates the structuring of a 4D representation space that encapsulates 3D geometric “Geometry” constructions, image textures, and sequential data, thus realizing comprehensive perception and prediction of the physical world – an upgradation of existing perceptual capabilities.

The second involves the formation of a more basic, ubiquitous semantic perception model; using the foundation of the vision large model, Millimeter escalates perception by incorporating multi-modality models that operate in the text-image domain. This alignment of natural language information and visual data from images can bridge both visual and linguistic feature spaces in the context of autonomous driving. Thus, it can potentially cater to wide-ranging identification capabilities, facilitating tasks like object detection, tracking, depth prediction, and more – essentially escalating perceptual and understanding abilities beyond just images.

Moving on, let’s discuss the cognitive stage.

Leveraging the capabilities provided by the universal semantic perception model, DriveGPT constructs a driving language to detail navigational environments and intentions. Alongside this, it integrates with navigation instructions and historical actions of the vehicle and borrows from the voluminous knowledge offered by external large language model LLM, thereby assisting in arriving at driving decisions.

Worth mentioning is that in the course of introducing the large language model, Millimeter furnishes it with specialized training and fine-tuning to foster better adaptivity to autonomous driving tasks. As such, the large language model can actually comprehend the driving environment, interpret driving behavior, and even formulate driving decisions. By aligning with the large language model, cognitive large models pave the way for autonomous driving decision-making backed by the common sense and reasoning capabilities of human society (i.e., world knowledge), consequently escalating the driving strategy’s interpretability and generalizability.

Following the sharing of the latest DriveGPT large model technology framework, Gu Weihao also provided seven major practical implementations based on the DriveGPT large model development model, including understanding of driving scenarios, labeling of driving scenarios, generation of driving scenarios, transfer of driving scenarios, interpretation of driving behaviors, prediction of driving environments, and on-board model development.

Of course, the most critical is the practical effect of DriveGPT enabling on-board autonomous driving in actual scenarios, which was also demonstrated at the event – for example, DriveGPT’s universal perception-based omni-object recognition ability has evolved from the original perception model that could only recognize a few types of obstacles and lane lines, to now being able to recognize all types of traffic signs, ground arrows, and even manhole covers and other full-element data of traffic scenarios.

The ultimate goal is to achieve low-cost and high availability of the Momenta intelligent driving solution, or, in other words, “cost-effectiveness”.

One hand on technology, one hand on landing

For Momenta, this AI Day has revealed a new choice in the current market environment: one hand on technology, one hand on landing.

On the technology level, Momenta has confirmed the progressive development route of autonomous driving with DriveGPT at the core and will continue to promote and explore the deployment of large models in autonomous driving scenarios in the cloud and on-board; on the landing level, previous technological accumulation will be turned into products and achievements based on market demand.

So, what achievements has Momenta made in autonomous driving this year? Specifically:

First is the installation volume of intelligent driving. At the event, Momenta announced that it is positioned as the “number one mass-produced autonomous driving in China”, and its assisted driving product, HPilot, has overall been installed in over 20 car models, with the user-assisted driving mileage exceeding 87 million kilometers – among them, the latest car models equipped with Momenta’s HPilot are the Shanhai Cannon HEV Edition and the new Mocha Hi-4S, among others.

In an interview, Momenta told us that in addition to models from the Great Wall series, Momenta’s pinpoint solutions with other car companies are also continuously and rapidly advancing, because this year has been quite busy.

The second is about the MANA large model. Momenta claims that in about 200 days since the release of DriveGPT, it has accumulated 4.8 million segments of high-quality tests – the key point is that currently, there are already 17 ecosystem partners, and they have helped these partners improve efficiency by 90%.

The third is about the City NOH Hundred City Battle. Earlier in April this year, Momenta announced that its City NOH will sequentially land in 100 cities by 2024 – at this AI Day, Momenta stated that the Momenta HP550 (originally HPilot 3.0) with City NOH navigation-assisted driving function will be mounted on the Weipai Blue Mountain and officially mass-produced on the market in the first quarter of 2024.Interestingly, at the AI Day event, Wei Brand Lushan, equipped with the HavMo HP550, presented a test video under the city NOH, featuring a full 12km journey in Baoding downtown, which took 35 minutes, during which manual takeover was needed thrice.

The fourth point is about HavMo’s end-to-end automated logistics delivery. HavMo’s end-to-end automated delivery vehicle, the ‘Little Magic Camel 3.0’, priced at 89,999 yuan, is the world’s first mid-size end-to-end automated delivery vehicle within 90,000 yuan, capable of meeting the needs across 9 major scenarios such as logistics, supermarkets, retail, etc.

At the event, HavMo also announced: ‘Little Magic Camel’ has distributed over 220,000 orders and is expected to become profitable in the supermarket delivery scenario in the fourth quarter of 2023.

Looking at the overall situation of HavMo’s AI Day, although technologies like large models are still the core labels, the importance of mass production has been once again emphasized like never before.

In the interview, HavMo also mentioned that the auto-driving technology has indeed undergone rapid iterations at the tech level over the past few years. Recently, through some works on algorithm lightweighting, previous auto-driving solutions that needed heavy computation power can now land on medium computation platforms; the iteration of chips is speeding up as well, so the cost is also decreasing.

Therefore, in the view of HavMo, as the industry delves deeper into auto-driving and the price of hardware such as chips falls, the all-scenario NOH is declining to 10,000 yuan, which is an inevitable trend.

Of course, with the technology breakthroughs and landing progress announced at this AI Day, HavMo also sincerely faces a core question that any technology startup company must answer: that is, how to strike a balance between technology investment and business landing, pointing ultimately towards a positive business cycle. As the management of HavMo said in an interview:

While HavMo is rapidly reducing costs, it is also necessary to preserve its financial health. Only in this way can HavMo and its customers and partners go very far together.

This article is a translation by AI of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.