Exploring the Future of Intelligent Driving: Horizon's New BPU Nash Architecture for Transformers and Large Models

In 2023, the development of intelligent driving reaches a critical point.

Specifically, with the efforts of major automotive companies, urban navigation assisted driving functions have begun to appear in some mass-produced vehicle models for testing, and commercial models are being explored, which has given the entire industry an opportunity to see the real possibility of large-scale implementation of intelligent driving.

However, there are still many issues with the current user experience of intelligent driving, and there is still huge room for development in the technological aspect, and it still belongs to the category of L2 – and when it comes to the future development of intelligent driving in terms of technology and commercial implementation, upstream and downstream players in the entire industry are constantly exploring.

Of these players, Horizen, which is in the upstream of the intelligent driving industry chain, is of particular concern to us.

On the one hand, based on existing commercial practices, Horizen has gradually found a path to promote high-level intelligent driving towards large-scale implementation; on the other hand, looking to the future development of intelligent driving, Horizen has not stopped its progress, but has continued to seek the optimal solution for intelligent driving calculation starting from the breakthroughs in fundamental core technology.

The technological breakthrough that Horizen has chosen to pursue is the new generation BPU architecture.

The New Generation BPU Architecture, Built for Transformer and Large Models

On April 18th, just in time for the first day of the 2023 Shanghai Auto Show, Horizen held a theme event called “Journey and Cooperation, Alongside Each Other” to officially release the latest generation of BPU intelligent computing architecture – this latest computing architecture has been named BPU Nash.

From a technological standpoint, BPU Nash has the following characteristics:

  • Uniquely designed with a three-level on-chip storage architecture, the cores can efficiently cooperate with each other to optimize the bandwidth bottleneck under large parameters;
  • Equipped with a multi-pulse cube acceleration engine, flexible data flow between engines can achieve high energy efficiency and low bandwidth utilization;
  • The data transformation engine flexibly supports Transformer’s smaller operators;
  • The floating-point vector acceleration unit has general, flexible features to meet the accuracy requirements of key operators;
  • The tightly coupled heterogeneous computing units efficiently accelerate different types of data processing;
  • Efficient and flexible multi-directional data flow within and between cores and chips achieves dynamic calculation scheduling and flexible tuning;
  • Virtualization technology transparently improves multi-task parallel processing capabilities;
  • Data-driven power consumption optimization reduces power by 30% for neural network dynamic range characteristics.From a technical perspective, BPU Nash’s overall design philosophy is to continuously emphasize the improvement of bandwidth, energy efficiency, versatility, parallel computing capabilities, and flexibility of the architecture, while also emphasizing the importance of reducing power consumption.

According to the official description, BPU Nash is specifically designed for large-scale Transformer models and interactive games, while also being optimized for cutting-edge algorithms to achieve the best algorithm efficiency. It also uses AI-assisted design to greatly enhance the programmability of the architecture.

In addition, it also has a super heterogeneous computing architecture that can significantly enhance the diversity of computing power.

Regarding the BPU Nash architecture, Horizon Robotics co-founder and CTO Huang Chang told 42Garage that it is developed for the next five to ten years and is a scalable architecture that includes computing units, storage units, etc., and can also cover single-core or multicore systems.

Therefore, based on the BPU Nash architecture, chips theoretically can achieve computing power from dozens of TOPS to 1000 TOPS, and the transistor count can also scale.

Huang Chang said that the emergence of the BPU Nash architecture is based on an important development trend in the current computing chip industry, which is a new computing method that focuses on large-scale data-driven, large parameter models, and de-regularization. In the future, there will also be more and more types of computing that are based on data-driven, large parameter, and large-scale models, which should run on BPU rather than CPU or GPU.

In the opinion of 42Garage, BPU Nash is also born for the latest development trend in the technical field of intelligent driving.

After all, from the situation in 2023, the entire intelligent driving industry has formed a very obvious technical trend. That is, the intelligent driving technology scheme based on BEV perception and Transformer large models has become a common technological solution promoted by many players, and this scheme will also be continuously optimized in the future with increasing data volume, improved computing power, and algorithm refinement.In this context, the emergence of the BPU Nash can be said to be timely.

Of course, when we look at the overall development process of Horizon BPU, the BPU Nash is actually the latest iteration version of Horizon’s intelligent driving dedicated computing architecture BPU for the future, which also reflects the fusion of algorithms, compilers, and architecture design, and fundamentally embodies Horizon BPU’s “intelligent evolution” philosophy for the latest neural network architecture and high-level autonomous driving applications.

Through this “intelligent evolution,” Horizon hopes to create the most powerful brain for intelligent driving.

BPU architecture has completed three generations of mass production verification

In fact, behind the emergence of the latest BPU architecture, in addition to Horizon’s understanding of future technological trends, there is another important premise: since its inception, BPU has undergone three iterations and has strongly supported Horizon’s commercial landing process.

In fact, at the beginning of its birth, the BPU architecture was mainly established on the dimensions of technical advancement and business exploration. In August 2019, based on the BPU Bernoulli 1.0, Horizon officially released China’s first vehicle-standard intelligent chip journey 2, which can more efficiently and flexibly realize processing of multiple AI tasks, perform real-time detection and precise recognition of multiple targets, and be applied in intelligent driving scenarios such as visual perception for autonomous driving, crowdsourcing for high-precision mapping and positioning, visual ADAS, and intelligent human-machine interaction.

Seven months later, in March 2020, through cooperation with Changan Automobile, Horizon’s journey 2 chip was first installed on the UNI-T model for the implementation of DMS functions in the intelligent cockpit field. Although Horizon and the automaker’s first step did not directly enter the ADAS field, this cooperation enabled journey 2 chip to exceed 100,000 pieces in 2020.

This also marked that Horizon received recognition from mainstream automakers, and its investment in the development of vehicle-mounted intelligent chips has been rewarded.

Also in 2020, based on BPU Bernoulli 2.0, Horizon released the journey 3 chip based on the 16nm process technology, which has 5 TOPS computing power and 2.5W typical power consumption, and can support a variety of applications, including advanced driver assistance, intelligent cockpit, automatic parking assistance, high-level autonomous driving, and crowdsourcing for high-precision map positioning. From a computational perspective, it is a chip that can directly compete with Mobileye EyeQ4.For Horizon Robotics, the commercial landing significance of Journey 3 is even more crucial — it was selected by the Ideal Automobile and mounted on the star model “2021 Ideal ONE” under the Ideal Automobile. As a result, more and more car companies are coming to seek cooperation with Horizon Robotics in the field of intelligent driving chip, and the commercial landing road of Horizon Robotics is continuously accelerating.

In July 2021, based on BPU Bayesian, Horizon Robotics released the third-generation vehicle-level product Journey 5. This can be regarded as a real domestically produced large-scale chip, with a maximum computing power of 128 TOPS, supporting 16-channel camera perception input, supporting the requirements of multi-sensor fusion prediction and planning control for advanced automatic driving. – It is worth mentioning that Journey 5, based on BPU Bayesian architecture, is the only large-scale intelligent driving chip in the world that has achieved mass production.

In September 2022, the Horizon Robotics Journey 5 chip was mounted on another star model of the Ideal Automobile, the Ideal L8 Pro.

In fact, since the first mass production announcement on Ideal AD Pro in September 2022, the shipment volume of Journey 5 has exceeded 100,000 pieces, and it has accumulated nearly 20 mass-produced models of 9 car companies, including Ideal, BYD, Weilai, and Hylera. New force car companies such as Aian, as well as foreign-funded and joint venture car companies; and this year, there will be more cooperative models with Journey 5 mass-produced landing.

Overall, looking at the evolution of BPU as a whole, since the first mass production announcement in March 2020, the shipment volume of Horizon’s Journey series chips has exceeded 3 million pieces, and it has reached mass production cooperation with more than 20 car companies and more than 120 models. Accompanied users driving billions of kilometers.

It can be seen that the technology path chosen by Horizon Robotics has been fully verified at the level of commercial landing based on the continuous evolution of BPU.

In the past three years, Horizon Robotics has successively mass-produced Journey 2, Journey 3, and Journey 5 chips, and the single-chip computing power has increased from 4TOPS to 128TOPS. Through the real computing performance supported by Journey 5 of processing 1718 frames of images per second, it has promoted the leap from ADAS to high-speed NOA landing. In order to promote the average MPI of automatic driving to achieve a 1000-fold increase in 5 years, and to meet the requirements of continuously innovative algorithm and model scales, the computing power and bandwidth required by automatic driving still need to be continuously improved, and Horizon Robotics believes that it needs to be improved by at least one or two orders of magnitude.

It is worth mentioning that Horizon has proposed an end-to-end algorithm framework based on BEV+Transformer this year. This architecture has been closed-loop verified on Journey 5, and technologies including pure visual BEV static and dynamic environmental perception are about to reach mass production status – obviously, this trend is closely related to the forward-looking design of BPU Nash.

Overall, the continuous evolution of BPU is the result of the combination of Horizon’s technological development path and commercial landing practice in intelligent driving. It has not only strong technological foresight but also very clear commercial continuity.

Idealism and Realism of Horizon

The launch of the new generation of Horizon BPU cannot ignore the background: intelligent driving is moving towards large-scale mass production and landing.

Based on this background, Garage 42 believes that there are some new trends in the landing level of intelligent driving. For example, from the perspective of mass-produced cars, relying on the processing power of stacked processors is no longer the mainstream approach, and it is difficult to achieve user value that matches hardware investment. At the same time, under the premise of meeting the demand for processing power, many players in the industry pay more attention to continuously optimizing algorithms and software.

This also strongly echoes Horizon’s founder, CEO Yu Kai’s views on intelligent driving. Yu Kai had previously stated at the China Electric Vehicle 100-People Forum:

The chip processing power is not completely proportional to the user experience, and stacking processing power on chips alone cannot create a better intelligent driving system. The processing power of current intelligent driving systems ranges from tens of TOPS to 1,000 TOPS, but the difference in final user experience is not that great. So what we need to do now is to improve the L2+ user experience through continuous algorithm optimization and more data, and approach the limit of system engineering to create value for consumers.

It is precisely based on this consideration that the latest generation of Horizon BPU has chosen to build a unified computing architecture based on Transformer for large parameters and optimize its computing efficiency and reduce power consumption in the architecture. Essentially, this reflects Horizon’s persistence in the combination of software and hardware on the technological path.It is worth emphasizing that Yu Kai’s attitude towards the future development of autonomous driving is also very calm. He believes that the development of autonomous driving is facing the most severe system-level technological challenges, open natural scenes, system randomness and uncertainty, and the game between dynamic systems and multiple subjects. This may be the most challenging system project in the history of human industry. Therefore, intelligent driving will remain at the L2+ stage in the next ten years, and fully autonomous driving will only be achieved on some dedicated roads.

Therefore, the entire industry can only continue to optimize users’ advanced driving assistance experience. What Horizon Robotics is doing is to continuously promote the progressive development of intelligent driving on the basis of software and hardware integration, and to continuously promote computing power, model scale, algorithms, data, infrastructure and other aspects.

In the view of Garage 42, this is obviously a more pragmatic and closer to the development law of the industry.

At the same time, facing the overall industry development of intelligent driving, Horizon Robotics’ self-positioning is also very pragmatic and open. Specifically:

Positioned as Tier-2, adhering to the “flexible and open, rich and frugal” business model, and creating the “ARM + Android” model in the era of intelligent automobiles through various methods such as open software IP authorization and BPU IP authorization. Based on the open technical solution of “chip + tool chain + reference algorithm”, it helps automobile companies and industry partners to efficiently develop and land differentiated intelligent driving schemes.

Even the bottom-level BPU IP authorization is included, which is enough to show that Horizon Robotics’ open strategy is very thorough.

However, while adhering to a rational and pragmatic landing strategy, Horizon Robotics still has ideals and firm beliefs for the future development of intelligent driving, that is: in the next ten years, electric vehicles will be equipped with autonomous driving systems as standard, taking over once every 100,000 kilometers, and commuting efficiency will be 10% faster than humans, the comfort level will reach five stars, and the commuting range will cover 99% of the roads.

Regarding this, Dr. Yu Yinan, Vice President of Horizon Robotics and President of the Software Platform Product Line, also said:

Ultimately, driven by end-to-end giant model algorithms, computing power close to the human brain and ultra-large-scale cloud computing platform, and data, we are very confident that in 10 years, the technology of autonomous driving will move from the current stage paradigm to unified expression of the physical world. By modeling the diversity of the world, we can produce driving models containing world knowledge. By combining slow cognitive systems with fast instinct systems, we can achieve the ten-year vision of autonomous driving.

What’s interesting is that from the prediction of “L2++ will still dominate the market for the next decade,” we can see the realism of Horizon in terms of commercial implementation; From the ten-year vision of “autonomous driving systems will become standard in electric vehicles,” we can also see the idealism of Horizon in terms of technological insights – from this perspective, the continuous evolution of the BPU architecture can also be seen as a strong confirmation of Horizon’s final decision in balancing idealism and realism.

Of course, in the intersection of realism and idealism, Horizon, as an important promoter of China’s intelligent driving towards large-scale commercial implementation, also becomes clearer and more specific.

This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.