From sighting to landing, Xiaopeng's technology traverses the new cycle.

Author: Wang Pan

Editor: Wu Xianzhi

No matter how much manufacturers boasted in the past, the phrase “future travel” always came with the word “blueprint,” which was only used for viewing and couldn’t be touched.

The Slogan “Prioritizing Intelligence, Exploring Boundlessness” on last year’s “1024 Xpeng Motors Technology Day” emphasized the property of “viewing” by emphasizing “exploration.”

This year’s Slogan for “1024 Xpeng Motors Technology Day” has changed to “From foresight to more than just meeting,” indicating that He XPeng intends to let users “touch” this year. As a window for observing Xpeng and even the entire industry’s technological development, whether assisted driving systems can enter cities, whether voice intelligence can cure intellectual disabilities, and whether other new species have evolved are all points of concern in the industry.

During the conference, Xpeng presented progress in four directions, including an upgraded assisted driving system, intelligent cabin, intelligent machine horse, and sixth-generation flying car, by “cutting down” on various modifiers and animations.

Compared with the first-generation intelligent assisted driving system XPILOT, Xpeng’s second-generation assisted driving system XNGP differs most in that it moves from a single scenario to a full scenario. In terms of the cabin, in addition to improving the sensitivity and accuracy of voice interaction, Xpeng’s “user-defined” function will change the design of the cabin from “guessing what you want” to “you have the final say.”

Regarding XNGP, Wu Xinzhou, the Vice President of Automatic Driving at Xpeng Motors, revealed more information when answering questions from several media colleagues:

Besides, for now there is still a distance from commercialization of the intelligent robotic dogs and the 6th generation flying cars. The real important issue is after the intelligent car is implemented, relevant enterprises will transfer the technology to other scenarios. This process will let the car brands tear off the “blue-collar” label, and become the true innovators of technology.

From On the Road to Entering the “City”

Standardized roads, sufficient LiDARs, and powerful algorithms have already solved the problem of “on the road” for assisted driving. However, it is blocked by visible or invisible roadblocks on the road to entering the “city”.

Data shows that without entering the city, assisted driving is not really applicable. City roads account for 71% of the total mileage of users’ car usage, and the utilization rate reaches 90%. In terms of frequency, 100% of drivers pass through city roads, while only 25% of users pass through the “paradise” of assisted driving – highways.

In other words, assisted driving without entering the city is actually a niche market.

The reason for not entering the city is simple: “difficult”. Not to mention the complex urban roads, but the vehicle density, irregular shaped vehicles such as tricycles and engineering vehicles, and pedestrians who do not follow the rules all create implicit roadblocks on the road of assisted driving into the city.

Not to mention the large-scale urban reconstruction in recent years, there are still many physical roadblocks. In the stronghold of XPeng Motors, Guangzhou, as of the end of September 2022, there were 500 publicized road occupies, with an average of two added daily.
“`The chief perception engineer of Xpeng’s autonomous driving center, Patrick, mentioned that the NGP (Navigation Guided Pilot) system has solved the problem of vehicle and pedestrian recognition on highways. However, “in urban environments, vehicles and pedestrians appear more densely, with more scenes of mutual blockage, and pedestrians and vehicles have more diverse movement trajectories.” Therefore, Xpeng has abandoned the object detection neural network architecture used in NGP for detecting dense scenery, and has built a new neural network architecture specifically designed for city environments.

In fact, the focus of NGP in the first half is to solve the driving problems in single-point environments such as highways, cities, and parking lots, such as Xpeng’s first-generation intelligent assisted driving system XPILOT. XPILOT can do math, language, and English problems, but it lacks comprehensive capabilities and cannot handle scene switching.

The landing of the city scenario is just the opening of the second half of the intelligent assisted driving technology, represented by Xpeng’s second-generation assisted driving system XNGP (Xpeng Navigation Guided Pilot), which connects the entire scene of highways, cities, and parking lots, and can run with or without maps, achieving the ability to connect all scenarios. The ultimate technological form of the assisted driving system before unmanned driving is achieved.

The so-called full-scene intelligent assisted driving system is actually a driving assistance system that users can use throughout the journey from the starting parking space to the ending parking space after initiating navigation on a regular map. Regardless of whether the route involves highways, city roads, or parking lots, for the user, it only involves shifting from P gear to P gear.

Regarding the naming, an insider at Xpeng mentioned that “X stands for X-PENG (Xpeng), which represents the full-scene experience. Retaining NGP can not only continue with the widely recognized name but also because the activation of XNGP’s capabilities also requires setting a regular map navigation.”It is well known that XPeng’s long-term strategy is “vision-based, assisted by lidar”. Therefore, XNet deep visual neural network is actually the true support of XNGP.

Patrick even believes that XPeng’s moats in technology are twofold. One is a complete understanding and control of hardware, which enables the design of more efficient network architecture. This is mainly due to XPeng’s engineering capabilities, which optimize the 122% Orin-X computing power required by XNet to only 9%, while also providing XNet with continuous iteration capability.

The other is the data closed-loop approach, which allows XPeng to find a continuous stream of weak scenarios, feed back to road model training, and continuously improve visual perception. For example, there are a large number of corner cases in urban scenes. XNGP solves this problem by collecting and returning long-tail data from the vehicle end. The data closed-loop approach is also one of the major advantages of XPeng’s full-stack self-development of automotive technology.

In terms of data annotation efficiency, XNet has a higher starting point. In the past, the industry generally used manual annotation, which was not only labor-intensive but also error-prone. XPeng’s fully automatic annotation system is 45,000 times more efficient than manual annotation. In simple terms, the annotation volume required by XNet would require 2,000 people per year with manual annotation, while the fully automatic annotation system only requires 16.7 days.

On August 2, XPeng built China’s largest autonomous driving intelligent computing center, “Fuyao”, in Ulanqab for autonomous driving model training. Based on Alibaba Cloud’s intelligent computing platform, “Fuyao” has computing power of up to 600 PFLOPS (60 billion billion floating-point operations per second), increasing the training speed of XPeng’s autonomous driving core model by nearly 602 times.

The breakthrough in technical capabilities directly expands XPeng’s potential in the B-side market. It is reported that it has successfully passed the closed site testing for autonomous driving according to the relevant requirements of the intelligent networked autonomous driving function test in Guangzhou.

Define cabins by people, and define people by cabins.On the “1024 Tech Day,” there were new changes to the Xpeng cockpit, including G9 Full-Scene Voice 2.0 and Xpeng Intelligence Scene.

There is only one measure for cockpit interaction, which is to be smooth — both fast and accurate.

The HMI (Human-Machine Interface) for city NGP generally includes visual (SR), auditory (voice broadcast), tactile (seat belts), and sensory feedback for acceleration and deceleration. SR is for intelligent driving assistance and provides environmental simulation. The above four interaction methods are nothing but ways to let users clearly understand the current state of the vehicle, what actions it will take in the future, and filter useful information.

In response to “fast,” the Xpeng Full-Scene Voice 2.0 response speed reaches millisecond-level, bringing communication closer to human-like interaction. The wake-up word to interface feedback takes only 270ms, Xpeng’s voice feedback takes less than 700ms, and the delay from voice command to execution can be within 1 second, all of which are the fastest in the industry.

In the past, in voice interaction, people needed to actively adapt to the car’s machine. After speaking, they had to wait for recognition and generation of results, during which the vehicle was in a “not listening” state. The flow understanding of the full-scene 2.0 achieves listening, thinking, doing, and answering at the same time, with faster online requests. G9 also uses local computing power to achieve local dialogue, which can stably control more than 600 vehicle functions under weak or no network conditions.

In addition, the single-dimensional vehicle interaction experience that can only accept linear dialogue logic has also undergone a qualitative change because of the new version. It is reported that when a user speaks out four instructions at once, Xpeng’s “little P” can execute them instantly. If there are conflicting commands, it will give a reasonable execution time and reply based on various conditions such as execution mechanism conflict, logical contradiction, and unsupported vehicle models, with fewer awkwardness in voice and semantics.

In terms of wake-up, due to the full-time dialogue function, there is no need to use a “wake-up word”. Even if there is an accent or an unclear semantic direction, the user only needs to say “XPeng” and it will review the previous dialogue and provide corresponding feedback.

There exists a bug in the in-car voice interaction – most of the time, there is more than one mouth in the car. The Full-Scene Voice Interaction 2.0 introduces MIMO multi-audio zone technology for the first time in the car voice system, which can process audio signals from four audio zones in the car and separate the effective speech signals, forming four independent audio channels.

With MIMO multi-audio zone technology, when one or more audio zones have users speaking to XPeng, XPeng can not only accurately locate the audio zone, but also understand the speech content of different people, thereby serving the four users in the car.

In terms of output, the main driving seat back speaker, the co-driver Bluetooth headset, and the entire car audio system form a multi-channel voice output channel in the car. With cross-audio-zone contextual dialogues (the driver says “turn on the massage”, the co-driver says “me too”), multi-audio-zone access control (to prevent misoperations in different audio zones), and improved multi-person rejection recognition accuracy, after the upgrade of multiple functions, XPeng’s cabin voice interaction capability has greatly outperformed its peers.

From the above-mentioned iterative information, the biggest feature of Full-Scene Voice Interaction 2.0 is the comprehensive integration of software and hardware.

XPeng’s second-generation voice architecture, developed entirely in-house, was once considered a futile endeavor and industry norms are to adopt third-party solutions. But XPeng not only developed it in-house, but also started from the origin of voice fundamentals.

As can be seen from the previous text, the second-generation intelligent driving assistance platform XNGP and the second-generation voice architecture are both new things, especially XNGP, which does not even have a product to benchmark against. In addition, suppliers are not very interested in matching uncertain commercial prospects. Moreover, internal debugging and iteration demand chains are much shorter than external ones.At the conference, X-Combo of XPeng Motors unlocked the customization capability of the intelligent cockpit and opened up the atomization of the vehicle perception and execution capabilities, allowing users to customize many functions of the car. With its features of operability, editability, and shareability, X-Combo provides collaborative creation, social media dissemination, and even cross-vehicle inheritance functions.

Compared to XNGP and the second-generation voice interaction architecture, X-Combo may lead to a transformation from defining user needs to user customization in car development. Car companies only provide platforms and tools, leaving everything else for users to build themselves. Even if users are too busy, they can still reference other people’s solutions through collaborative creation.

Car Companies Take Off Their “Blue-Collar” Hats

This year’s “1024 XPeng Auto Technology Day” also saw iterations in the field of intelligent ecology, including intelligent robots and the sixth-generation intelligent flying car.

After a large amount of autonomous driving, AI technology, and human-machine interaction are reused on new species, no one should ask the question of “why XPeng wants to build horses and flying cars” anymore.

For example, intelligent robots also require compact and efficient “three-electricity”. The empowerment of automotive power technology can help robots improve endurance, cost, battery life, and battery safety. The flying car is the same size as conventional cars and can freely travel on open roads. Through the folding and transformation system, the switching between road and flight modes is involved, and the technology for road travel can directly reuse intelligent car technology.

In fact, through the “1024 XPeng Auto Technology Day”, we should see that the role of the main factory is quietly changing.

In traditional sense, due to the nature of discrete manufacturing, automakers mostly act as technology integrators, which means they are not the source of technological innovation but rather make a cost-based choice when it comes to innovation. Innovation comes from the industrial chain, for example, Tier 1 suppliers like Bosch, ZF, Magna, or even more basic components suppliers.

With the advent of intelligent manufacturing, industrial production and technological development are highly integrated, and simple technology integrators are simply incapable of meeting the challenges. Objectively, it also impels automakers to enhance their technological capability.

Therefore, we see new players expanding into intelligent manufacturing and new areas, such as NIO making mobile phones and Xpeng creating robots. Meanwhile, some forward-thinking traditional automakers are also undergoing transformation, like Great Wall, BYD and SAIC incubating their own self-driving, in-car systems, and intelligent cockpit suppliers. Even the four major processes in the workshop – stamping, welding, painting and assembly – will test the technical capabilities of automakers in all aspects with the digitization and integration of hard and soft processes.

“The autopilot technology is 99% from AI” has indicated that technology-oriented car ecology is the ultimate form for the next generation of automakers, and fully self-developed technology is actually a manifestation of the industry’s self-evolution. For instance, automakers are transforming from technology integrators to technology creators and suppliers, just like the intelligent robot dogs appearing in the Tech Day or the sixth-generation flying car that have already landed in the automotive field.

New technologies are easy to catch people’s attention, but the depth of technological reconstruction is often overlooked. Careful observers will find that XNet, the deep visual neural network, plays a pivotal role in XPGP, and one of the reasons for its continuous and rapid iteration actually comes from the change of annotation methods. Compared with the first-generation visual perception architecture of Xpeng automobiles, XNet has replaced complex handwritten logic with neural networks, reconstructed the data collection, annotation, training and deployment process. Without intelligent improvement in efficiency, even the best blueprint will remain at the “seeing” level.Therefore, during the “1024 Xpeng Automotive Technology Day”, we should not only expect to see new species, but also witness the trend of OEMs evolving into new species.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.