Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account


Arithmetic chip, how to break through?

2024-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >


Shulou( Report--

As an official account of technology, in today's article, let's talk about some more macro development topics other than technology:)

Recently, I exchanged some less "pure technology" topics with a friend: how to catch up with and surpass the advanced? In the process of communication, it also triggered some deeper thinking about the development of technology.

There are too many articles and videos online about catching up with and surpassing the advanced. As a perennially engaged in computer computing chip-related work, today, from the perspective of computing chip, I will talk about some personal views on how to achieve a breakthrough in domestic computing chip.

1 mature track, backward to catch up with the advanced, it is difficult to 1.1 CPU enmity in the 1970s, Intel invented the CPU. Through continuous investment in CPU, Intel has gradually gained market advantages and gradually built its own x86 ecology, including peripheral hardware partners, firmware development such as BIOS, operating system software, tool chain and application software ecology, and so on.

RISC is an example of failure. X86 is the CISC architecture. As the complexity of CISC instructions becomes higher and higher, and it becomes more and more difficult to control, RISC architecture gradually rises. RISC architecture processor advocates simplifying instruction set design, fixing instruction length, unifying instruction coding format, and accelerating common instructions. RISC architecture has become the first choice for many processors, and it has also become a classic CPU design case for many computer textbooks. But even so, RISC architecture still lost to CISC in the market competition.

Anton is an example of Intel's own failure. Anton is a 64-bit CPU processor launched by Intel in 2001. Although he is the son of Intel, although he is a powerful 64-bit CPU architecture, and although Anteng's architecture and micro-architecture design are very excellent, it is inevitable to fail because of the incompatibility between Anteng and x86. Finally, the success of AMD64 was achieved.

The success of ARM comes more from the business model. At the beginning, the performance of the processors developed by ARM is very poor, and the performance of the processors developed by ARM is usually lower than that of CPU, the ARM architecture developed by some giant customers. But because ARM is a neutral CPU architecture and IP provider, many giants are willing to support it to move forward. Finally, in the smartphone era, ARM was a great success. With the financial strength, the performance of ARM's follow-up CPU gradually caught up with and partially surpassed its own giant customers.

RISC-v, a rising star and a rising star, the possible future success also depends on a better business model. Similar to ARM's situation, RISCv is now weaker in performance and ecology than x86 and ARM, but it is also growing rapidly because of a better business model (a fully open source and widely agreed free processor).

1.2 NVIDIA, from a decade of sharpening a sword to a market capitalization of trillions of dollars, the traditional GPU is a graphics accelerator card, which is essentially one of the acceleration cards in many fields and scenarios. In addition to GPU, there are almost no successful cases for many other kinds of accelerator cards. The final success of GPU comes from the transformation of NVIDIA in the 2000s: on the one hand, GPU has transformed from a traditional image acceleration card to a parallel computing-oriented GPGPU;. In addition, in order to reduce the threshold of development, it has devoted more resources to CUDA and declared itself to be a software company.

Even if the strategy is correct, the final successful verification is almost a decade later. The earliest version of CUDA was released before and after 2005, and it was not until the rise of deep learning in 2012 that GPU really began to stand out, and it was not until the rise of big models in 2018 and the popularity of ChatGPT in 2013 that NVIDIA was pushed to the top.

To sum up, some enterprises often shout the slogan "want to be China's xxx", but "learn from me and die". Chips are an international market and global competition, which is tantamount to "Handan learning to walk".

In a mature track, if you move forward by imitating the advanced, you will not be able to succeed. Backwardness requires differentiation, innovation and advantages in order to succeed. Moreover, it is much more difficult for the backward to succeed than that of the advanced.

(2) the change of technology is the key time for domestic new energy vehicles to catch up with and surpass the advanced, and it is a classic case of backward catching up with and surpassing the advanced. According to the General Administration of Customs compiled by the China Association of Automobile Manufacturers, 2.341 million vehicles were exported in the first half of 2023, an increase of 76.9 percent over the same period last year. In January and July, car exports totaled 383.73 billion yuan, an increase of 118.5 percent. China's automobile exports surpassed Japan for the first time, ranking first in the world. New energy vehicles are the core growth point of China's automobile exports. From January to June in 2023, 800000 new energy vehicles were exported, an increase of 105% over the same period last year.

In the mature track, the advanced has technological advantage, market advantage, patent advantage, brand advantage and so on. It is difficult to catch up with and surpass the advanced. However, if it is a period of technological change, the backward can be laid out in advance in the new field of technology, so that the two sides can stand on the same starting line, so as to get the opportunity of "fair" competition, thus it is possible to achieve transcendence. Domestic cars have seized the wave of new energy and smart cars and quickly reached the first place in the world in terms of car exports.

So where is the opportunity for chip change?

3 the challenge of the large AGI model

In the AI model at the beginning of 2023, the parameter scale of "coincidental" stays at the level of hundreds of billions of dollars. Why?

The core reason is that this is the upper limit of computing power that current GPU computing clusters can support:

On the one hand, single-chip computing power has been a bottleneck, the growth of computing power is extremely slow.

On the other hand, limited by the current server CPU-centric architecture constraints, as well as the efficiency of network interaction, the cluster size has also reached the upper limit.

There is also a very important reason, that is, the construction and operating costs have reached an astronomical figure.

At present, CPU performance has long been a bottleneck, GPU performance is about to peak and the cost is high, while AI chips are too dedicated to be suitable for rapidly changing model algorithms / operators and business logic.

How to solve? We can also give a simple answer:

On the one hand, continuous Scale up can improve the performance of a single chip by an order of magnitude through more processor cohesion.

On the other hand, continuously enhance the internal interaction of the chip (breaking the existing CPU-centric architecture) and external interaction (enhance high-performance network). Increase the number of servers in the cluster by an order of magnitude.

In addition, large chips need to be universal. Whether it can achieve enough versatility is the most important factor that large chips can land on a large scale.

It is also important to reduce the cost of computing at the quantitative level through some mechanisms.

4 the rapid progress of chip technology

With the continuous progress of the process, the advanced packaging of Chiplet is becoming more and more mature. From 2D technology to 3D packaging to 4D packaging of Chiplet, the underlying implementation technology of the chip is still developing rapidly.

The current high-power chips are usually around 50 billion transistors. Intel's plan is to reach 1 trillion transistors by 2030. This means that the computing scale is 20 times larger than the current chip.

How can we make better use of such a large scale of transistor resources?

5 Historical opportunities for the reform of computing chips 5.1 system architecture innovation

On the one hand, demand traction, on the other hand, process support, two factors, we need to do more innovation at the system architecture level.

From single core to multi-core, from isomorphism to isomerism, from mono-isomerism to polyisomerism, and then from polyisomerism to heterogeneous fusion, is a process of inheritance and development of computing architecture from simple to complex.

The scale of chip design is getting larger and larger, and the integration of processors with more architectures on a single chip has become a very common design. This kind of multi-heterogeneous hybrid computing architecture is called super-heterogeneous computing by Intel. In the "White Paper on heterogeneous Fusion Computing Technology" issued in September 2023, a more rigorous and accurate term, "heterogeneous Fusion Computing", is adopted. It deeply reveals that the key of multi-heterogeneous hybrid computing lies in the cooperation and fusion between heterogeneous processors.

5.2 how can large chips be universal? The scale of the system is getting larger and larger, and the change is faster and faster, so that in the big computing chip, versatility is more important than performance. On the other hand, the customized accelerated computing chip has few scenarios and short life cycle, so it is difficult to land on a large scale.

In addition, GM is a more advanced capability than dedicated capabilities. General computing needs to extract and disassemble common parts and components from many requirements, and flexibly combine all kinds of functions needed by users through software programming. It also needs to achieve both performance and flexibility.

So, how to achieve universal? What is the essential reason why it can be used in general?

The larger the scale of the system is, the more obvious the characteristics of "28 laws" are. In this way, we can accelerate the implementation of the deterministic commonness part of the hardware, and the relatively uncertain personality part through software programming.

On the basis of the six-generation computing architecture, a "general" constraint is added to become a three-generation general computing architecture:

The first generation single core and the second generation multi-core merge into CPU isomorphism.

Cancel the dedicated DSA heterogeneous computing phase, heterogeneous computing only retains the general heterogeneity of GPU.

For polyisomerism to be successful, it needs fusion; for heterogeneous fusion to be successful, it needs to be universal. Therefore, from the final thinking, the final solution that can be landed will be the general heterogeneous fusion computing.

5.3 from individual combat to teamwork is limited by advanced technology, we can not achieve the most powerful chips. But we can achieve stronger swarm intelligence through the collaboration of more resources:

Method one, heterogeneous fusion. Through the innovation of heterogeneous computing architecture, more processor cores can be coordinated and converged. When the process lags behind 1-2 generations, the computing power of a single chip can be better.

The second method is to calculate power network. Through the computing power network, east and west arithmetic, the cross-cluster computing power scheduling and computing cooperation can be realized, and the efficient utilization of computing resources can be realized.

Method 3, intelligent network connection. Through the intelligent network connection of the terminal, the cloud collaboration is realized. The intelligent network connection automobile China plan put forward by the academician of Tsinghua University emphasizes the deep coordination of vehicle (terminal), road (MEC access), edge and cloud, which can achieve a more intelligent user service experience when the computing power of the individual is limited.

Method 4, edge fusion of cloud network. Larger computing nodes, higher performance and lower latency network, more powerful computing infrastructure, to achieve a more powerful macro digital system.

5.4 Summary from heterogeneous to heterogeneous fusion computing, the change of computing architecture gives us the opportunity to overtake in the corner; the historical opportunity is fleeting and needs to be whipped and invested more quickly.

Seize the historical opportunity of the reform of computing architecture to realize the corner overtaking of computing chips!

This article comes from the official account of Wechat: software and hardware Fusion (ID:cash-arch), author: Chaobowx

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information


© 2024 SLNews company. All rights reserved.