Author: French fries fish
Beijing, outside the northwest fifth ring road, Zhongguancun integrated circuit design park.
This is a place that an ordinary “Chaoyang masses” like me rarely reach.
First, there is spatial distance. I need to run half a circle of the Fifth Ring Road, plus a section of expressway, to reach here, which is more than 40 kilometers in total, so it is not easy to come here.
Second, there is psychological distance. Keywords such as “Zhongguancun”, “integrated circuit”, and “chip design” give me the first impression of high-end, academic, and hardcore, and the second impression is boring, tedious, and difficult to understand. Therefore, I always have a so-called awe.
But this time, we spent a little effort to overcome the spatial and psychological distances, naturally because of some interesting things.
We are going to visit a domestic intelligent chip company, Horizon, which has a “lopsided skill tree”.
Relocation, Teaching Assistance, and Underground Work
Horizon officially released the “Journey 5” with a single chip 128 TOPS computing power last July, and set a goal of “delivery in Q4 2022” thereafter.
Affected by the uncontrollable objective factors of the epidemic, I thought that Horizon’s goal was going to be delayed by 99%. The answer is no.
Dr. Huang Chang, co-founder and CTO of Horizon, said that the first mass-produced model of the Journey 5 chip will be launched in the fourth quarter of this year.
Behind the “on-schedule progress” is a bunch of “lopsided skill trees”.
Skill One: If the entrepreneurial team does not have “mastered relocation,” they are not a good chip company.
This may be one of the highest-educated professional relocation teams in China.
To ensure that the project progresses on time, the Horizon Shanghai team organized an “emergency strategic shift” before the epidemic blockade in Shanghai, moving all relevant chips, equipment, and various movable development tools to neighboring cities such as Suzhou and Hangzhou to avoid development resources being trapped. Similarly, when there were fluctuations in the epidemic in Beijing recently, Horizon received news that the office building may be shut down and started a new relocation work on the weekend.
Skill Two: If developers do not “master teaching assistance,” they are not good engineers.
Who would have thought that the teaching assistance industry would also produce “the most beautiful retrograde hero?”During the epidemic in Shanghai, Horizon’s factory also implemented a closed-loop management system. Some of the production work at Horizon requires engineers to go to the production line for operations. However, under the closed-loop management system, engineers are obviously unable to enter. Therefore, the engineers joined the “Dusk Education Industry” and taught frontline workers through remote video and other methods, to maintain project production progress.
Skill Three: A programmer who does not have a “deep underground” mindset is not a good programmer.
You read that right, in 2022, a group of “underground workers” appeared in Horizon’s company.
To be precise, they are “underground parking lot workers”. During the epidemic in Beijing, a development team at Horizon insisted on working in the company’s underground parking lot for over a month in order to avoid the impact of the epidemic. These “warriors” working in the musty, damp, and sunless environment of the parking lot made me feel the precious spirit of some revolutionary predecessors.
Of course, these descriptions are all self-mockery and jokes made by everyone to ease the hardships. But behind the jokes there must be employees’ belief in their own products, which makes me more curious about what kind of chip is the Journey 5.
Computing power? Efficiency?
The maximum AI computing power of Journey 5 is 128 TOPS. As one of the few domestic players in high-computing autonomous driving chips, Horizon’s gaze is not on the domestic market but on Nvidia.
Nvidia’s Orin-X has a computing power of 254 TOPS. Objectively speaking, if we only look at the absolute value of computing power, Orin-X is undoubtedly stronger than Journey 5. But interestingly, Horizon believes that the efficiency of Journey 5 is higher. They gave a set of data, which shows that at the same accuracy, for efficient models, the performance of Journey 5 is 268% of Orin-X’s, and its energy efficiency is 870% of Orin-X’s.
Of course, it should be noted that because Horizon still cannot obtain Orin-X’s hardware platform, they also admit that this set of data actually comes from indirect calculations of Nvidia’s last-generation autonomous driving chip, Xavier.
As the media and ordinary consumers, the arms race between products is what we are most willing to see. It is not that real consumers will focus on which performance parameters are high or low, because the ultimate best thing is good actual performance for assisted driving. Compared to oligopoly dominance, we are more willing to see a benign market competition in which more players participate.Translate the Chinese Markdown text below into English Markdown text, in a professional manner, preserving the HTML tags inside the Markdown and outputting only the result.
As to which is the king, “absolute computing power” or “real efficiency”, it is unpredictable before the product is implemented, but Horizon has provided us with a very rigorous product logic in this regard, in the product concept of Journey 5, efficiency is the first performance indicator.
Huang Chang proposed the following AI performance formula three or four years ago:
“FPS/Watt &FPS/ $”=”TOPS/Watt &TOPS/ $”×”Utilization”×”FPS/TOPS”
The left side of the formula represents the real efficiency of the AI chip.
Here, FPS represents the frame rate processed by the AI chip per second, which can be regarded as the final processing efficiency of the AI chip; Watt represents the power consumption of the AI chip; $ represents the hardware cost of the AI chip.
Therefore, the left side of the formula represents the energy consumption, power consumption, and hardware cost required for us to achieve the corresponding FPS AI processing performance, namely the real efficiency.
The right side of the formula consists of three parts: “TOPS/Watt &TOPS/ $” refers to hardware architecture design, “FPS/TOPS” refers to algorithm architecture design, and “Utilization” refers to software architecture design.
Explanation of Real Efficiency Formula
First, let’s talk about hardware architecture design.
TOPS/Watt &TOPS/ $ refers to the power consumption and cost required for each TOPS computing power, which depends on the continuous evolution of hardware technology and processes. From 28nm, 16nm, 7nm to 5nm, the continuous evolution of semiconductor processes can improve this value, which is what most hardware companies will do.
Usually, the industry talks about “de-coupling of software and hardware”, which means to reduce the connection between software application development and hardware design. This is to reduce the development difficulty of chip users, so that chip users can focus on their own software implementation, and the lower-level hardware can be handled by the chip company.
But Huang Chang said that Horizon wants to do a “soft-hard integration”.
In fact, these two are not contradictory, because they describe different stages of the AI chip. “Decoupling of software and hardware” refers to the chip application stage, while “soft-hard integration” refers to the hardware design stage.
“Define chips with software based on actual scenarios.”
This is the design philosophy of Horizon’s hardware, also known as the “from software to software” principle. After all, the design purpose of AI chips is to better support software needs, and the existence of AI chips is to realize software functions.
Specifically, in the process of chip hardware design, Horizon considers on-chip storage arrays, tensor computing organization, instruction set design and other factors to maximize hardware utilization in combination with software.
These are also the characteristics of Horizon’s hardware chip attributes.
Next is the algorithmic architecture design.
Many people may have heard of Moore’s Law for hardware: the density of semiconductor transistors will double every 18 months. However, it should also be noted that Moore’s Law has long been invalidated due to physical limits of silicon materials.
But the algorithmic Moore’s Law has not been invalidated. The algorithmic Moore’s Law refers to the fact that the number of calculations required for an AI task to achieve the required accuracy will continue to decrease as the algorithm evolves.
According to a previous industry research report released by OpenAI in the United States, the effective computational complexity required to complete the same AI task will be halved in approximately 9-14 months.
Compared with hardware improvement speed, the algorithm improvement speed is much faster.
In this regard, Huang Chang said, “The logic of Horizon is to solve the main contradiction.”
In Huang Chang’s view, the differences between hardware from various companies will not be very large, but in terms of algorithms, due to their fast evolution speed, each company will generate significant differences, turning algorithms into the current core competitiveness. This may also be a major reason why Horizon has proposed that “the new generation of intelligent automotive chip leaders must also be world-class AI algorithm companies.”
Finally, there is software architecture design.
For an already finalized AI chip, continuous improvement of software architecture can still improve the real efficiency of the AI chip while keeping the hardware architecture and algorithmic architecture unchanged.
Here, the “software” referred to is actually biased towards low-level compiler optimization.
Different compilation, disassembly, recombination, deployment, and other methods for the same algorithm on the same chip will affect the effectiveness of hardware scheduling.
These are the significance of software architecture and the purpose of continuous evolution of software architecture.To be honest, for us laymen, the terms such as software architecture and compilation sound particularly virtual because we don’t understand them. But Huang Chang proposed a quantitative data. When the Journey 5 chip was released, Horizon gave the strongest computing performance of 1283 FPS, but now, this number has become 1531 FPS.
With the hardware and algorithm architecture of the chip fixed, the strongest computing performance of the Journey 5 chip has been improved by 20% by optimizing the software architecture alone.
“Let software do what software can do, let hardware do simple and efficient functions that can be called by software as much as possible.”
This is Huang Chang’s understanding of the software and hardware of autonomous driving chips from Horizon, which is also the advantage that Horizon, as an AI chip company that design hardware architecture, software architecture, and algorithm architecture, has compared to its peers in China.
And this afternoon’s communication clearly also made Huang Chang talk excitedly. The Journey 5 is still in full swing preparing for mass production, and the next generation Journey 6 is also on the way. This time, the new core computing architecture BPU code-named “Nash”.
The first-generation BPU, code-named “Gauss”, the second-generation BPU, code-named “Bernoulli”, the third-generation BPU, code-named “Bayes” …
Horizon likes to use mathematicians’ names to name their core computing architectures.
This kind of preference is like their biased skill tree, which is probably the romance in the hearts of straight engineering men.
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.