What is the real value of Xiaopeng's "Fully-Voice-Activated In-Car System"?

XPeng’s “Full Voice Car System” is gaining popularity recently.

I am often asked, “Have you tried it? How is it?”

My honest answer: Although there are still some small issues and bugs, XPeng P7’s in-car voice interaction experience is currently the strongest in the industry.

What makes it strong? Why is it strong? What is the significance of its strength?

At first glance, the issue of simple voice technology is actually very interesting once examined deeply. Let’s continue.

What makes it strong?

The most intuitive aspect is the experience.

This has been mentioned in many articles before, so I will briefly summarize two points:

First: It is becoming more and more like a real person.

Although the voice systems of many car machines have been done well in the past two years, users still feel a significant “talking to a machine” feeling when giving voice commands. The machine is not as smart as I am, and often fails to understand my true intention.

For example, every time I wake up the voice system, I can only give one command.

For example, “help me navigate to XXX”. At this time, the system can give you several route choices and ask you “which one to choose”. After I choose, it starts navigating. But if I also want to play my favorite song, turn down the navigation volume, and change the direction of the air conditioner blowing towards me, the car’s system is basically clueless.

At this point, XPeng’s “Full Voice Car System” demonstrates its ability. It supports “continuous conversation” and “semantic interruption”, and can receive multiple commands continuously within 20 seconds. On the one hand, you don’t need to wake it up again by saying “Hey Little P” after executing each command; on the other hand, it can start receiving the next command directly while executing the previous command.

In addition, many car machines’ voice systems also have a pain point, and of course, a difficult point. For example, after awakening voice recognition and giving instructions, the co-driver suddenly answers a phone call and talks a lot. At this time, the car’s system will also be clueless because “it can’t know which words should be recognized and which should not be recognized.” Just like humans, “knowing which words are addressed to me, need to be answered by me, and which words are not my business”, this is difficult for machines.

However, during my experience, I found that XPeng is already very strong in this regard. For example, after waking up the voice system, if I chat with my friends in the car, it will recognize it, but won’t respond. If I suddenly insert “turn off the air conditioning” during the conversation, the car’s system will respond quickly and execute it.

(The "nonsense" in the upper left corner, XPeng recognizes but does not execute)Translate to English Markdown text:


There is one important point: XPeng’s voice capabilities already support a certain degree of “mind-wandering”. Unlike before, where we had to give specific commands to the car system, on the XPeng P7, I can more freely express my needs. For example, if I want to adjust the driving mode, I don’t need to adjust to “Sport, Comfort, Energy Saving” specifically, I can just say “I want to adjust the driving mode”, and XPeng can understand my intentions and provide options. Even, I can directly adjust the position of the audio source inside the cabin using voice commands.

(Adjusting audio source position through voice commands)

There are many similar examples, and we will make a video to help everyone better understand how XPeng’s car system voice commands are “more human-like”.

Second: It does better than humans;

This is reflected in some precise control. For example, I can say:

“XPeng, open the car window to 1%, open to 81%”;
“XPeng, move the song forward 20 seconds, back 1 second”;

This is amazing, we ourselves cannot control these things so precisely. But, there may be some technical overkill, what’s the use of such precise adjustments? Don’t worry, we will discuss this later.

Why is it strong?

First, let’s answer the second question, why can XPeng’s “Full-Voice Car System” be so strong?

There aren’t many technical issues to discuss. In fact, the differences in technology between different companies are not that great. The investment in technology by car companies, coupled with the increasing capabilities of suppliers and the flow of industry talent, have made core technologies such as voice recognition, semantic understanding, and machine learning dramatically improved over the past two or three years. Therefore, objectively speaking, XPeng’s “Full-Voice Car System”, if other automakers want to make it, it is not impossible to do so.

So, why XPeng?

I think it first has an attitude problem.

From the perspective of brand and product labels, XPeng is betting on “intelligence”. This creates a situation where teams like voice recognition that are related to intelligent experiences, have a sufficient say in the integration and mobilization of internal resources.

This is important because we need to know that the strength of voice technology itself does not determine the goodness of the consumer experience. Every voice command corresponds to one or several functions, and the experience of these functions determines the consumer’s overall perception. XPeng’s voice team needs to communicate and cooperate across departments to optimize any user experience, but any team that doesn’t take it seriously from the top down, will prioritize them lower, and in the worst case, result in a dead end of the project. This is something that colleagues who have worked in a large company all understand.To my knowledge, Xpeng’s voice capabilities have addressed the question of “can it be done”, which can be simply understood as, as long as you want it, the voice feature can be added to any operable vehicle hardware and software. Currently, Xpeng’s voice team is focusing more on the question of “should it be done” – should a certain feature be controlled by voice, to what degree, and whether it is even human-oriented?

For example, Xpeng’s current voice control does not support adjusting the rearview mirror, not because they cannot achieve it, but because they consider the convenience of user experience, the problem of standardized data for adjusting angles and corresponding experiences that come with it, so Xpeng has not prioritized this feature. Obviously, Xpeng is already leading in this aspect.

Secondly, there is the issue of the company’s overall capabilities. As previously mentioned, to provide a good voice experience, it is essential that voice-controlled services and functions are done well. This involves not only the company internally, but also the external suppliers’ ability to control and mobilize themselves, including navigation, music, video, etc. It is not just as simple as moving the mobile version’s App directly onto the car but needs optimization of the onboard end, and then match the appropriate voice call logic.

Two to three years ago, Xpeng was just starting to build its brand and products and the sales were insufficient, thus their capacity to attract suppliers was limited. However, now, in addition to He XPeng’s resource mobilization capability in the Internet circle, the increasing sales and stock prices have added chips to the cooperation process. “Do you think I can do it in the future? If so, let’s collaborate and work well together.”

What is the significance?

Many people may question:

Is Xpeng’s “full-voice car system” too high-tech? After all, it’s very unlikely that users will insist on opening the car windows by “81%”, let alone “closing the car windows by 1%”.

So, are these features overly flashy and without practical value? I am more inclined to consider this issue from an overall perspective.

I have asked some friends and group members who have listed Xpeng P7 on their purchase list or have already bought it. They share a common trait: they are young at heart, willing to accept new things, and sensitive to new technologies. From the user attributes, they are more similar to Tesla Model 3 prospective customer base, rather than BYD Han EV users who are relatively more pragmatic in their mindset.

For this group of people, Xpeng’s “full-voice car system” is like the Face ID feature when iPhone X was first launched.When friends, especially the girl you are pursuing, sit in your XPeng P7, you say to it, “Activate ‘transform mode’, open the window to 81%, and direct the air conditioning away from people.” Then you watch as it performs each task one by one. That feeling is like when you demonstrated Face ID facial recognition login in front of a group of curious people three years ago. Although you can log in using fingerprints or passwords, the joy of successfully logging in with facial recognition is something only those who understand it can appreciate.

Isn’t the sense of satisfaction and superiority that products bring to people based precisely on these details?

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.