A wave of Chinese style sweeps across the automotive supply chain, and now it’s on the chip race.
Ideal L8 and L7 were launched on September 30th with both Pro versions carrying China’s first high-performance car-grade chip, the Horizon 5 from Horizon Robotics.
Ideal L8 broke the traditional approach of “released this year, launched next year” and will begin deliveries after only two months. What’s even cooler is that the domestically made high-performance chip called “Horizon 5” closely follows.
It’s important to note that designing, fabricating, and testing a chip requires a high level of technical expertise, and keeping up with the pace of Ideal, a new automaker known for its iterative efficiency, is not an easy feat.
With this, Horizon has officially entered the World Cup final of high-performance automotive intelligence chips. Horizon 5 must compete with big players such as Nvidia Orin, Qualcomm Ride, while also guarding against Mobileye EyeQ6 waiting to enter the field.
Now let’s talk about high-performance chips, Horizon 5, and Horizon Robotics.
The era of high-performance chips, Horizon 5 enters the field to reshape the game
Looking back, we can see that automotive chips in the field of driving have roughly gone through three periods:
Before 2016, the rise of the industry.
The earliest intelligent driving solution in the automotive industry, accurately it should be called ADAS, was primarily arranged by Mobileye and relied on low-power Mobileye EyeQ series chips. Horizon was founded during that period, focusing on edge computing chips and targeting the automotive field.
From 2016 to 2021, the warming period.
With the rise of the concept of autonomous driving, Tesla, the top player, has been the most revolutionary and fastest to iterate on autonomous driving features, led by Musk.
As computational demands rapidly increased, Tesla successively abandoned both Mobileye and Nvidia and developed a self-designed FSD chip.
On the Chinese market side, Tesla’s apprentice Wei Xiaoli also followed but did research and development in autonomous driving based on Mobileye EyeQ4 and Nvidia Xavier due to the lack of chip development capabilities. This was when Chinese autonomous driving chips began to emerge. In 2020, Horizon Robotics’ Horizon 2 was used in Changan UNI-T for mass production, and in 2021, Horizon 3 was used in Ideal ONE for mass production.
At this point, the industry has formed three situations:
- Mobileye EyeQ4 L2 system forming its own camp;
- Nvidia Xavier and Orin, as well as Huawei MDC, occupying the high-end market;
- Horizon 2 and Horizon 3 occupying the mid-end market.# The Fierce Competition in the Intelligent Driving Industry and the Heat of Putting High Computing Power Chips to Vehicles in 2022
As we enter 2022, the competition in the field of intelligent driving is becoming increasingly fierce. High computing power chips for higher-level autonomous driving are on the rise.
Representative phenomena include the mass production and delivery of high computing power chips from NVIDIA, Horizon Robotics, and Qualcomm. The three companies’ high computing power chips are respectively NVIDIA Orin, Horizon Robotics Journey 5, and Qualcomm Ride.
Several chips have their own characteristics:
- NVIDIA Orin has the largest computing power, reaching 256 TOPS. It is expensive to develop and has many mass-produced vehicle models;
- The computing power of Journey 5 is 128 TOPS, the purchase cost is half of Orin’s, and the first mass-produced model is the L8 from Li Auto, which has signed cooperation with no less than five auto companies this year;
- Qualcomm Ride has the highest computing power, reaching 360 TOPS. At present, only Great Wall has announced the adoption of it in China.
NVIDIA Orin, with its openness and the launch of high computing power chip products, has taken away many customers from the closed Mobileye. Qualcomm, the newcomer, seized the opportunity based on the high performance of its chip products and won over Chinese automakers such as BAIC and Great Wall.
Chinese player Horizon Robotics relies on its strength in performance and cost advantages. After shipping more than 1.5 million pieces of Journey 2 and Journey 3, it has won over several automakers with its new product, Journey 5.
In addition to Li Auto L8 and L7 mentioned above, Horizon Robotics Journey 5 will be put into mass production and adopted by BYD, Zhiyoujia, SAIC Group, FAW Hongqi, and an East China host factory from 2022 to 2024. The solution involves single J5, dual J5, and multi-J5.
Looking back now, the reason why Journey 5 can be adopted quickly is that Horizon Robotics focuses on the development of edge computing chips and has grasped the pace of mass production in recent years.
The development of smart cars can be roughly divided into three stages:
- Full supplier solution stage;
- Chip procurement, software self-development;
- High-performance chip self-reliance.
During the first stage, automakers mainly purchased mature solutions packaged with hardware and software from old chip companies for quick mass production. In the second stage, automakers need more self-developed software capabilities for intelligent differentiation. At this time, there were not many choices for vehicle regulation-level autonomous driving chips, and Horizon Robotics seized the opportunity.Theoretically, the third stage should be hosting factory’s self-developed software and hardware, but in reality, the hosting factory has no possibility of self-developing chips at this time. However, with the improvement of perception performance and the deployment of BEV large-scale algorithm, higher requirements for computing platform performance have emerged. Both NVIDIA and Qualcomm have the technical strength, but the difference lies in high cost and uncertainty of use.
Horizon Robotics is the only domestic company that can provide a single-chip computing power of over 100 TOPS, so this is the second rhythm that Horizon Robotics has tapped. According to the production schedule of Journey 5 currently available, we will soon see a group of new cars equipped with Journey 5 being put into production.
From the market rhythm point of view, Journey 5’s timing is precise, but autonomous driving is a system that requires high performance and safety, which brings a new problem: what kind of technological support does Journey 5 have to enter the finals of high-performance large computing power chips?
Why Journey 5?
There are many pain points in the vehicle-class chip industry, such as high chip design complexity, closure, high collaboration development costs, and few options for large computing power chips.
Only solutions to these pain points will be accepted by the market.
Journey 5 can quickly get on board and be mass-produced with more than one model, depending on continuous efforts at each stage, from design, layout, testing, manufacturing, to supporting vehicle manufacturers to mass-produce corresponding autonomous driving algorithms and even tracing back to the product development process of the vehicle-class chip.
You can even say that because there are too few large computing power chips available on the market.
After all, due to concerns over research and development costs and autonomous driving nodes, traditional automotive chip manufacturers have basically not entered the field of large computing power chips or entered late, except for a few such as NVIDIA.
Vehicle manufacturers look around, and only NVIDIA, Horizon Robotics, and Qualcomm are the few options available. Just close your eyes and make a choice. However, the reason why a product can impress users essentially depends on whether it is easy enough to use.
Chips actually have a set of easy-to-use standards.
The first layer, also the most basic one, satisfies the computing power requirements. If it enters the field of over 100 TOPS, it can basically meet the intelligent driving functions of high-speed and urban areas, and vehicle manufacturers will choose you.
The second layer is strong in algorithms and looks at the final performance effect. After vehicle manufacturers receive the chip, they not only look at the computing power data on the statement but also look at the software architecture, algorithms, etc. The algorithms will match the computing power of the first layer, and ultimately determine the performance of the chip.
The third layer includes the entire cost of chip production, including development fees, etc., which is the final factor that affects vehicle manufacturers’ decisions.
Horizon Robotics has two core advantages: one is the chip with just the right amount of “computing power,” and the other is the ability of core IP and perception algorithms.
The chip with just the right amount of “computing power”The computational power is a crucial indicator for high-performance chips, but the overuse of this concept has become a key strategy for some chip companies and OEMs to win in the public opinion arena.
Dr. Huang Chang, co-founder and CTO of Horizon Robotics, once said:
Computational power is the “electricity, water, and coal” of the entire digital economy era, but it also implies certain costs. Tesla previously shared a comparison between FSD and Nvidia’s computing platform, showing that FSD hardware had a 21-fold speed increase over its predecessor, but its peak computational power was only 80%.
In fact, if computational power is just about the so-called physical peak computational power, it does not necessarily correspond to actual processing capability.
What does this mean?
The higher the physical computational power of a chip, the larger the transistor count and the size requirements of the chip, which means that corresponding power consumption and costs have to be paid. Under such conditions, the optimal understanding of computational power and actual computational efficiency will become more cost-effective.
“Computational power is useless without speed,” and peak computational power corresponds to cost, analogous to horsepower and AI processing capability corresponding to acceleration per 100 kilometers. The performance that a driver or passenger can truly feel is how many seconds it takes to accelerate to 100 kilometers per hour, not how much horsepower it has.
Similarly, for chips, what users can truly feel is not the theoretical peak computational power, but how fast it can compute under basic computational power and how large the data samples are for image processing. “Real computational performance” is frames per second (FPS), which refers to how many images can be processed per second and how much chip cost has to be paid. This is the true power efficiency ratio and cost-effectiveness ratio.
And “FPS” is equal to the product of three elements:
- Hardware architecture elements
- Algorithm architecture elements
- Software architecture elements
The hardware-based architecture design refers to the proportional relationship between the theoretical peak computational power obtained and the power consumption and cost paid. Ultimately, the design of hardware architecture is tested by advanced process technology. In other words, high performance also means high cost.
The algorithm-based architecture design refers to how much computing power can be exchanged for processing speed, and FPS/TOPS is based on the algorithm architecture. In the past decade, OpenAI has statistically shown that for applications ranging from images to languages to natural language processing, the average time for deep learning algorithms to reach the same computational power has been reduced by half every 9 to 14 months. Algorithm optimization can improve computational speed.The design of software architecture is about maximizing the utilization of computing resources by optimizing the compiler and dynamic runtime library of the underlying system to achieve the optimal scheduling of data flow. It involves breaking down the algorithm as much as possible and efficiently laying it out on the targeted hardware, which tests the level of software optimization and software architecture design.
In simple terms, the real FPS performance is determined by the combined capabilities of “hardware architecture design, software architecture design, and algorithm architecture design”. In recent years, the algorithm has developed the fastest. When Moore’s Law for hardware fails, the performance improvement of the entire computing power lies in the joint optimization of software engineering, algorithms, and hardware architecture.
“Hardware and software integration” precision optimization is achieved by the combination of software and hardware systems that support paradigm-level intelligent algorithms. This is what we often mean by “hardware and software integration.”
To explain “hardware and software integration,” it is necessary to understand that “hardware and software integration” and “hardware and software decoupling” are not the same concept. The software and hardware architecture of a computing platform should be fully considered in the design phase to efficiently support future algorithm development trends. However, when a computing platform has been developed and its hardware and software are provided to the developers for use, it is necessary to support software and hardware decoupling, or more strictly speaking, the decoupling of algorithm and application development from the computing platform.
For example, from low-level software to middleware, software abstraction is used to make the underlying hardware platform independent of upper-layer application development and algorithm development, thus achieving software and hardware decoupling. Its essence lies in the “decoupling of algorithm and application development from the computing platform.”
In contrast, “hardware and software integration” refers to the “integration of software and hardware in the design phase of computing architecture.” Hardware architecture includes on-chip storage arrays, tensor computing units, middle-ware instruction sets, and all require new optimization. In terms of software architecture, it is necessary to analyze how to break down and reorganize each specific algorithm so that it can maximize parallel operation and the efficiency of the reasoning process on the chip, including reducing latency and saving bandwidth.
What did Horizon Robotics do to achieve this? At the beginning of chip development, Horizon Robotics developed its own BPU (Brain Processing Unit) architecture. Journey 2 was designed based on the BPU Bernoulli 1.0 architecture, Journey 3 was based on the self-developed BPU® Bernoulli 2.0 architecture, and Horizon Robotics Journey 5 used the BPU Bayesian architecture.
Translation in English Markdown Text:
The performance of a chip reflects the system’s ability to achieve a high level of joint optimization in software engineering, algorithms, and hardware architecture.
What does this mean?
Dr. Huang Chang said that when an algorithm and application are developed at the current stage, many people need to manually debug the deployment and operation process. In fact, AI has the ability to complete adaptive tasks.
Therefore, it is necessary to design algorithms on top of algorithms, allowing basic algorithms to better iterate and adapt in application scenarios, such as replacing expert systems with deep learning and reinforcement learning methods, constantly deepening and expanding.
The Bayesian architecture of the BPU uses neural network algorithms to meet the requirements of autonomous driving scenes. Its architecture features include small concurrent data, flexibility, strong performance, and low power consumption, with the calculation kernel being pulsating tensors. Even in the near future, it will form a unified neural network.
Currently, half of the chip may be covered by the BPU, and image algorithms such as traditional ISP image processing, video encoding and decoding, and image rendering may move from image algorithms to neural network algorithms.
The BPU Bayesian architecture mainly solves the problem of design and rationality of heterogeneous computing units. It enables rational deployment of memory and calculator units and can provide flexible access to high-bandwidth storage. Therefore, Horizon BPU can provide sufficient computing density and energy efficiency under very flexible conditions.
What does this mean?
Its processor and computing units will all be converted to BPU, this unified neural computing architecture. Except for input and output, more than 95% of the chip architecture power consumption will serve general computing. Only a small amount of chip areas will have special instructions, which is the inevitable trend in the future.
In other words, whoever can construct a unified neural computing architecture software and hardware system and support a wide range of ecosystems, especially an ecosystem oriented towards the development of robots, will have unlimited imagination space.
The development of a chip by Horizon is not a one-way technological product but needs a balanced technical system.
What did Horizon achieve?
With more than 1500 frames per second of processing power, the Horizon 5 only uses 30 watts of ultra-low energy consumption. Its energy consumption is only one-sixth to one-ninth of that of Orin, with a peak computing power of 128 TOPS. However, this is not important because what Horizon truly cares about is the real computing performance of Horizon 5, which reaches 1531 FPS, and the latency can be achieved at 60 milliseconds, with the entire power consumption being around 30 W.
Finally, Horizon won.Dr. Yu Ka, in public occasions full of car manufacturers, can say: “Although our price cost is less than half of theirs, the performance is comparable. Nvidia’s Orin is more than $400, and ours is less than half of that.”
If we briefly summarize the secret of the rapid entry of Horizon Journey 5 into the car industry, it is that the chip’s own computing power is strong enough, and when working with car manufacturers, it can provide a handy reference solution for software and hardware, as well as software development platforms such as toolchains, and the entire set of solutions is based on mass production experience.
It is precisely because of these preparations that Horizon enjoys the same strong status as Nvidia and Qualcomm in the international community today.
Why Horizon?
In the competition in the chip industry, purely relying on technology is obviously not enough to compete with Nvidia, Qualcomm, Huawei, and other companies. Providing high-performance products is the first step, and having a complete commercial model will make things twice as easy.
Therefore, from the beginning of his business, Dr. Yu Ka clearly stated that Horizon is a Tier 2 company with three types of customers: Tier 1, automakers, and other ecological technology companies. For software clients, they provide development boards and development toolchains, and for hardware, they provide chips and development toolchains.
In Horizon’s dictionary of cooperation, there are only two words: “open.”
“I admire Dr. Yu Ka very much. He has a very deep insight into and sensitivity to business. He is different from other founder out of a technological background who know who technology is for and how to serve them well. So, we can see that he is very decisive in making open solutions and high-performance chips, and he has a very accurate judgment on market demand.”
This is what a new product manager for intelligent driving in a new-car company told me.
Why did Horizon enter the finals of high-performance chip competitions?
As a startup technology company, before 2020, Horizon was not as famous or designated scale for automakers as mainstream foreign companies.
During this period, Horizon did two things:
- Polish chip performance and improve chip reliability;
- Developed a complete technical stack in software fields such as core visual algorithms for autonomous driving.
This created Horizon’s current product ecosystem:
- A set of chip products covering several TOPS to hundreds of TOPS of computing power, with high and low matching;
- A complete set of IP and software technology, including operators and some middleware;
- A complete set of intelligent driving hardware and algorithm reference designs, which can be delivered at the white-box level;
- A complete set of development toolchains and AI platform.
In the cooperation with Idealsee, the feedback from Idealsee’s autonomous driving team is that during the use of Journey 3, they not only provided us with some hardware-related feedback, but also provided us with many suggestions on perception algorithms. In addition to their expertise in hardware, Idealsee has also demonstrated their expertise in software.
To summarize, Idealsee can open up all the technologies they have to their customers and help them learn these technologies. Additionally, the biggest advantage of Idealsee is that every generation of their products can meet the most urgent needs of domestic car companies at specific points in time.
In conclusion, only with the strong support of chip suppliers, can car companies successfully climb to the peak of autonomous driving.
Starting from the mass production of Journey 5, Idealsee from China can stand on the same starting line as Nvidia and Qualcomm, which has never happened before in the chip history.
After Journey 5 opens the mass production of high-level, high-computational-power chips, Idealsee will only get faster. The huge cost advantage of Journey 5 means that urban intelligent driving will become more widely available. More importantly, with the improvement of their foundational technical capabilities, China has the opportunity to lead the establishment of an innovative ecosystem in the global smart car era and win opportunities for industry reform.
In this sense, whether it is Idealsee or other Chinese chip companies, they hope to win this competition.
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.