In November 2022, when OpenAI announced ChatGPT on its official website, no one anticipated what would come next. Astonishingly, within just two months of its launch, ChatGPT achieved 100 million monthly active users, thereby becoming the fastest-growing consumer application in history.
Fast forward less than a year to November 2023, at OpenAI’s inaugural Developer’s Day, ChatGPT underwent an epic upgrade. Not only was a ‘GPT Store’ launched, but it even permitted each individual to customize a GPT with exclusive skills. This pace of development and implementation, staggering in its own right, once again placed the ‘large models’ at the heart of attention.
In fact, with the explosive popularity of ChatGPT, the large model technology it relies on has acted as a ‘shockwave’, extending from the AI sector to the entire social stratum, beginning to empower various industries from a business perspective. Notably, the automotive industry, which is currently undergoing a major technological transformation and fully embracing intelligence, is no exception.
Specifically, since the advent of ChatGPT nearly a year ago, automakers have eagerly embraced large models. It isn’t just about in-depth layout; they’re also exploring opportunities for model implementation via intelligent cockpits, smart driving, and several other user scenarios. They hope to leverage the capabilities of large models to enhance the actual usage experience of their customers.
Hence, a question worth pondering arises: How should large models really be implemented in cars?
Embracing Large Models: A Quickly Established Consensus
Before discussing the automotive industry’s embrace of large models, we need to clarify a premise: although including the automotive industry, every field is discussing ChatGPT and large models—and indeed, the technology of large models has burst into widespread recognition due to ChatGPT —the relationship between the two is very tight, but they’re definitely not interchangeable.
In reality, ChatGPT is merely a typical application of OpenAI’s deep learning large model GPT, and GPT is just one of the many well-known and breakthrough large models in the industry. For instance, when ChatGPT made a sensational debut in November 2022 and attracted widespread attention in the industry, it was based on one version of GPT, specifically, the GPT-3.5 Series.
Nevertheless, as a representative work of large model technology, ChatGPT indeed served as a critical entry point for the automotive industry to pay attention to large model technology.
Among them, Musk, who spans both the AI field and electric vehicle industry, was undoubtedly the biggest trailblazer. As a matter of fact, just two days after the release of ChatGPT, Musk expressed his admiration for ChatGPT multiple times on social media, stating that “ChatGPT is so good it’s chilling.” Such praise immediately drew attention from China, and the ‘large model’ technology behind ChatGPT quickly piqued interest in the automotive industry.Nevertheless, the focal point of the intensive connection between Chinese auto manufacturers and the large-scale models is Baidu’s “Wenxin Yiyuan”.
On February 7, 2023, Baidu officially announced its new large-scale model project “Wenxin Yiyuan”. This project is a generative conversation product launched by Baidu based on its own Wenxin large-scale model, similar to the “Chinese version of ChatGPT”.
In the following period, although “Wenxin Yiyuan” has not yet been officially launched, companies from various industries have announced their participation in the “Wenxin Yiyuan” ecosystem; in the automotive industry, companies including “VOYAH”, “Hong Qi”, “Great Wall Automobile”, “NISSAN”, “ALWAYS”, “LEAPMOTOR”, “Geely”, “HM”, “HOZON” and others have joined the “Wenxin Yiyuan” circle. But relatively speaking, these automakers embrace “Wenxin Yiyuan”, it could be better understood as grasping momentum and showcasing their determination to transform towards intelligence.
By March, some leading new-energy vehicle manufacturers began to express their thoughts on ChatGPT and large-scale model technology in public, and these thoughts were more in-depth.
For instance, in early March 2023, at the LI Auto spring media sharing meeting, Li Xiang systematically discussed the logic and thinking behind the development of LI Auto, mentioning that “AI can change the physical world”, and from here he talked about his views on ChatGPT and large-scale models. He said that “ChatGPT is difficult, but I’m not afraid of its mistakes”; whereas when talking about large-scale models, he stressed the influence of large-scale models on the development of autonomous driving business, believing that “only large-scale models can achieve the AI 2.0 that I really want”.
Following that, at the Xpeng 2022 Q4 and Annual Financial Results Conference Call in March 2023, He Xiaopeng stated that AI applications represented by ChatGPT have allowed hundreds of millions of users to see the tremendous potential of generative AI models, signifying that in the future, such general abilities of machine brains will enter a new stage. Furthermore, this can be deployed locally, providing a new interpretation and efficiency boost to autonomous driving and accelerating the path from L4 to L5.
Compared to LI Auto and Xpeng, NIO’s statement on ChatGPT and large-scale models came later. At the end of May, Li Bin publicly stated that “The best application scenario for large-scale models is in the car.” After the NIO ES6 launch, he stated that NIO is working to integrate large-scale models into its voice interaction system, NOMI.
Notably, within this trend of automakers embracing large-scale models, an unignorable player is the tech giant Huawei.
In fact, as early as April this year, at the Huawei press conference, Richard Yu announced that through the Harmony 4.0 system they will introduce Pangu large-scale model technology on the upcoming AITO M9 at the end of the year. The Pangu large-scale model will take AI capabilities to a new level on the AITO M9—even though no further details have yet been released, based on Huawei’s accumulation in large-scale models, the actual performance of large-scale models on the AITO M9 is highly anticipated.
From June onwards, virtually all automobile companies seize the opportunity to voice their embrace or plan for large-scale artificial intelligence (AI) models at public events.
It’s fair to say that after experiencing the potent shock waves from ChatGPT and the enthusiastic support from major carmakers such as “NIO, Xpeng, and LI,” various head initiatives have expressed their views on the application of large models. As a result, a consensus emerged in China’s fiercely competitive auto industry, which is collectively looking for ways to transition to AI – regardless of the means, they must rapidly embrace large models.
Under this impetus, questions on how large models can be practically applied have started to preoccupy companies. Right now, intelligent cockpits and autonomous driving are the two primary areas where large models are being implemented.
From AI Interaction to AI Operating Systems
Looking from a practical application perspective, as ChatGPT’s conversational abilities and assistant features are naturally linked and similar to different AI voice assistants, the exploration of car companies on large model application scenarios often focuses on areas where users perception is stronger, such as intelligent cockpits, especially in-car voice assistants.
In this respect, the work of “NIO, Xpeng, and LI” is closely watched.
On June 17, during the Ideal Family Tech Day, LI Auto rolled out its plans on large models with the launch of Mind GPT, its self-developed cognitive model.
Specifically, LI Auto’s Mind GPT is more akin to ChatGPT; bolstered by its knowledge storage abilities, it’s designed to make “LI Buddy” smarter. Functionally speaking, Mind GPT allows chat generation, language comprehension, knowledge Q&A, logical reasoning, and other abilities to become safer, more precise, and more logical. It also possesses a built-in memory network, enabling users to let LI Buddy remember personalized preferences and habits based on historical conversations, thus understanding the users better.
According to official statements, with the general capabilities of Mind GPT, LI Buddy can be the mentor and friend who accompanies users around the world, a professional car housekeeper, or an expert teaching users drawing and programming. It’s AI empowering every single user. Overall, the capacities brought over by large models through LI Buddy focus more on entertainment and knowledge enhancement, which is local to LI Auto’s family car usage scenario.
However, looking at the moment, Mind GPT is still undergoing development. As officially stated, the large model-infused LI Buddy will be given to users through OTA push before the end of 2023.
Apart from LI Auto, as a vital player in the intelligent cockpit field, Xpeng Auto also announced at the 1024 TechDay that the self-developed XGPT model of Xpeng Auto has been integrated into the voice system. The brand-new AI Xiao P has over 800 skills, with significantly improved perceptual understanding and reasoning capabilities — Xpeng Auto official stated that AI Xiao P would be installed in the XOS Tianji intelligent cockpit system and first mass-produced in the Xpeng X9 MPV model.The English version of the Markdown Chinese text,
Moreover, NIO has also applied for trademarks such as “NOMIGPT” and “NIOGPT”, and announced in a small-scale meeting not so long ago that NOMI is now integrated with an in-house GPT model with up to a hundred billion tokens, which will enhance NOMI’s understanding and reasoning capabilities. However, it is yet to be seen how this self-developed GPT model will add functionality to NOMI, as exact details have not been released by the officials.
Looking at the top three new-age car companies—”NIO”, “LI”, and “Xiaopeng”, they all chose to start with in-car voice assistants, seeking to provide dialogue, understanding, and creative capabilities to the voice assistant through the integration of large AI models. The value for users is more focused on knowledge interaction, providing information, and entertainment. Of course, in the construction of large models, all three highlight self-development.
Outside of the new force companies, some traditional car companies dedicated to transformation are not sparing any efforts in embracing large models, be it choosing to independently develop or collaborating with partners.
For instance, Geely announced in July that it has a full-stack, self-developed large AI model technology. According to Geely’s official statement, its full-scene large AI model includes a drawing model, a music model, a language model, and an autonomous driving model, offering AI intelligent interaction, AI music MV, AI children’s picture book, WoW wallpaper, and other functions.
Not only that, at Geely Galaxy L6’s launch in September, Geely announced the introduction of its self-developed industry-first full-scene large AI model into the Galaxy L6. Looking at the actual state of Geely Galaxy L6, only the WoW wallpaper function, which offers a stronger user experience perception, is currently implemented in the car.
Additionally, in mid-August, Chery Automobile partnered up with iFLYTEK to announce the inaugural collaboration of iFLYTEK’s “Starfire Cognitive Model” in the “Star Epoch ES”. Under this collaboration, “Star Epoch ES” strives to provide a voice assistant that knows warmth and care, offering more thoughtful services for users. Aside from planning travel itineraries and recommending favorite movies, it can also provide health consultation services based on individual user situations. Its ability focuses on ChatGPT. For users to experience this more caring voice assistant, they will have to wait until “Star Epoch ES” is launched on November 30.
Of course, besides the aforementioned manufacturers, there are many other car companies actively trying to implement large model technology through voice assistants.
However, looking at the big picture, within an intelligent cockpit framework, the implementation of large models in intelligent cockpits and voice assistants is still in the exploratory stage. To truly broaden its user base, it still requires a disruptive scenario.During a previous conversation with Garage No.42, Xiaopeng He informed us that under the enhancement of a large model, the AI little P is smarter and more interesting. It is capable of executing tasks of elevated entertainment value such as sustained dialogue, drawing, and poetry. However, it’s not sufficient to be a major competitive force, as it resolves minor issues rather than major ones.
The cockpit team from LI Auto, while interacting with Garage No.42, explained their exploration into large model implementation is still primarily based on actual user experiences in driving. They strive to find realistic scenarios that align with the users’ actual needs, which are still under investigation. Li Ke, director of SenseTime’s intelligent vehicle cabin product, also expressed the same sentiment. To have a distinct impact on the user side, the large model needs to align with the user’s frequent usage scenarios.
Despite this, empowering vehicular voice assistants with large models is a tactic no automobile company can afford to miss.
Chen Ye, the founder and CEO of Tigerobo Technology who is currently collaborating with a car company on a large-scale intelligent cockpit model, told Garage No.42 that there is already a high level of homogeneity in vehicle industry products. The implementation of a large model in vehicles is pursued by many companies primarily for differentiation under the enhancement of intelligence. However, he believes that effective execution of a large model in vehicles needs time to develop a genuine Killer App or breakthrough application – a direction of many companies’ exploration.
However, an AI industry expert relayed to us that car companies choose to begin with vehicular voice assistants when implementing large models due to its comparative simplicity and elevated user perception. In the long run, large models are likely to serve as the ‘AI operating system’ of a vehicle, closely linking AI capabilities with different user scenarios to explore additional possibilities.
Smart Driving, The Promised Land of Large Models
Apart from intelligent cockpits, intelligent driving, another critical component of vehicle intelligence, is also seeking breakthroughs and improvement opportunities within the large model’s technical framework.
Here we need to clarify that the underlying technology of large models like GPT is the Transformer algorithm. Since its proposal through the paper ‘Attention is All You Need’, the Transformer algorithm has been continually applied in the field of natural language processing. However, Tesla integrated the Transformer for the first time into its autonomous driving algorithm during the Tesla AI Day in August 2021, promoting BEV+Transformer as a new paradigm for mass production in the whole autonomous driving field.
Hence, with Transformer as the connection, there has always been some underlying technological link between large models and autonomous driving. Therefore, when some companies discuss their adopted BEV+Transformer perception architecture, they often use terms like ‘Transformer large model’ or ‘large model era’.
Undoubtedly, the significant impact of large models on autonomous driving primarily occurred in 2023.
With the emergence and burgeoning popularity of ChatGPT, the versatility and generalization ability manifested by GPT have triggered new thoughts on model construction methods in the autonomous driving field, resulting in some novel experiments.
At present, the advancements seem to be notably spearheaded by Momenta and Xpeng Motors.
In April this year, Momenta, closely associated with Great Wall Motors, released the industry’s first autonomous driving generative large model DriveGPT at AI Day. Its underlying model draws inspiration from GPT, but unlike ChatGPT, DriveGPT’s input is the text sequence integrated from perception, outputting autonomous driving scenario text sequences. These sequences tokenize autonomous driving scenarios into a “Drive Language”, thereby accomplishing tasks such as vehicle decision-making, obstacle prediction, and the output of decision logic chains.
By mid-October, Momenta further demonstrated its exploration and breakthroughs in advancing autonomous driving AI technology through large models. For example, in perception, DriveGPT builds a visual perception large model to learn about the real physical world. The real world is modeled into a 3D space, further expanded with a temporal dimension to form a 4D vector space. Subsequently, Momenta introduced an open-source multimodal large model to construct a more generic semantic perception large model, integrating text, images, and video modalities of information, thus completing the alignment from 4D vector space to semantic space.
Simultaneously, in the cognition phase, DriveGPT constructs a driving language to describe the driving environment and intent. This is then combined with navigation guidance information and the vehicle’s historical actions. The vast knowledge from external large language models (LLM) aids in making driving decisions; akin to endowing an autonomous driving system with common knowledge and reasoning ability in human society during decision-making.
Overall, Momenta’s attempt lies in introducing large model methods in perception, cognition, decision-making, etc., aiming to enhance versatility and generalizability of the autonomous driving system’s processing abilities, and thereby making it smarter overall.
Surprisingly, Xpeng Motors introduced large model capabilities into the newly released XNet 2.0 perception architecture at the 1024 Tech Day held in October, much like Momenta. Specifically, under the aegis of large models, Xpeng’s XNet 2.0 can comprehend text information on traffic signs, grasp the concept of time and understand semantic elements of various city-specific traffic – all aimed at increasing the generalization capability at the perception level.
In fact, the robust capabilities of large models also serve as a significant advantage to Xpeng in widening the applicability of urban NGP in more cities.
It’s noteworthy that end-to-end has evolved into another paradigm in the advancement of autonomous driving technology. A prime example is that of Tesla, which incorporates the concept of end-to-end into its approach to ‘large models’.
In August this year, Elon Musk demonstrated live for the first time on social media, the FSD V12 system of Tesla, being the earliest end-to-end AI driving system. Musk stated that the entire FSD V12 process was implemented entirely by AI, void of any coder-written lines for road or pedestrian recognition codes. This was all fulfilled by a neural network.
Sharing the same ‘end-to-end’ approach as Tesla is the ‘Planning-oriented Autonomous Driving’ paper that won the best document at the world-class CVPR conference in the realm of computer vision. This paper was jointly published by the Shanghai AI lab, Wuhan University, and the Sensetime team and primarily outlines the UniAD integrated framework for autonomous driving algorithms. Specifically, it aggregates multiple modules, such as perception, prediction, and planning in the driving algorithms, into a task-oriented end-to-end framework, based on Transformer.
The value contributor of this paper is referred to as ‘the beacon of autonomous driving’ by insiders. Nevertheless, as an academic paper, its real-life applications are far off. Sensetime referred to the term ‘universal large model of autonomous driving’ when they introduced UniAD.
Despite variations in the interpretations and applications of the ‘large model’ among different players, end-to-end is gradually becoming clearer direction in the autonomous driving sector. For instance, Momenta stated at multiple occasions during AI Day that end-to-end autonomous driving is their future goal. As Zhang Yaqin, the director of Tsinghua University’s Intelligent Industry Research Institute suggested, the introduction of AI’s large model has transitioned AI from discriminative to generative, indicating that the ultimate goal of reliable and safe autonomous driving can only be achieved through end-to-end implementation.
However, ordinary users still struggle to tangibly grasp the clear changes that large models can introduce to intelligent driving. They might require a ‘ChatGPT moment’ of sorts similar to what Elon Musk referred to for Tesla: a sudden realization that millions of cars can now drive autonomously.
An industry observer, who has been keen on studying large models, told us that the industry’s current concept definition for large models is still chaotic and blurred. But one thing is clear: the philosophy and methods of large models are disruptive to the conventional AI operation model. Considering the AI’s continuous supplementation to the autonomous driving over the past decade, the evolution of large models could revolutionize the field of autonomous driving and play a significant role in steering its progression.### A transition from quantitative to qualitative change, has just started
Whether in terms of intelligent driving or intelligent cockpit development, automakers’ embrace of large models points to one fundamental truth: amidst the highly competitive landscape, all automakers face considerable anxiety pertaining to the trend of intelligent transformation, fearing that they may fall behind in this race of AI tech applications.
By 2023, the ‘large model’ concept, which went viral, has become a crucial tool for them in this transformation period.
A research staff involved in large AI models, tells Garage42, that the large model concept is indeed a pivotal point in the evolution of AI. Its significance cannot be understated. However, currently, this concept, once popularized by the automobile industry, will inevitably establish a stronger cognition amongst consumers. Viewing it from a different perspective, the emphasis and exploration of large models by automakers is also largely attributed to the close connection forming between the automobile and AI field; an unstoppable trend.
However, the utilization of large models undoubtedly presents certain technical difficulties.
In terms of technical attributes, large models inherently require the support of substantial computational power and data. Its environment for original development and optimization is more suitable for the cloud. Both the computational power and storage space in vehicles are rather limited, making the implementation of large models, from a research and development as well as deployment standpoint, a highly challenging task.
In other words, the application of large model tech firstly requires a substantial cloud computing hardware system for support. Secondly, it must be deployed better in more specific in-vehicle application scenarios through ‘edge-cloud deployment’ capabilities. This represents a challenging test for traditional automobile manufacturers, while for emerging enterprises that have deployed the computational power basis, it is relatively easier — for companies like Huawei with strong deployment from cloud to edge, it represents a definitive advantage.
Therefore, it is essential for automakers, whilst embracing and exploring large models, to openly consider user value, and also exhibit sufficient strategic patience.
Ultimately, in the long run, for the automobile industry, the large model, as a recent achievement in the AI era, will definitely bring a degree of definite value. Chen Ye, Founder and CEO of Hublot Tech, considers the large model as a portal into the future. It can provide infinite opportunities and possibilities for vehicle intelligence, both in terms of intelligent cockpits and driving — this implies that automakers must invest in it to seize potential opportunities; otherwise, no opportunities exist.
A personnel from the large model research and development department of a leading new force in the industry informs us that active research into large models by the technical departments of automobile manufacturers is also significantly driven by the pursuit of general artificial intelligence — an ultimate goal sought after by tech personnel.
Overall, viewing from the current situation, although the actual user experience value that AI large models can bring is still in its early exploratory phase. In order for it to be truly incorporated into vehicles and to fully realize its value, continuous efforts revolving around the integration of technological advancements and user demand are required from automakers. However, in the long term, its influence on automobile products and the automobile industry will be profound. This is also a certainty which all car manufacturers will have to face — from this standpoint, the process of large models empowering cars is a lengthy transitional phase, finally reaching a critical point of qualitative transformation.So, how long will this process take?
Everything is yet to be determined, perhaps three to five years, maybe as long as ten years – but one thing is very clear, whether willing or not, car companies have been passively or actively involved in the era of big AI models, and they will continue to fight in this long racetrack where the end is unknown. Before reaching that critical point, they have to survive this brutal battle first.
This article is a translation by AI of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.