At the SIGGRAPH conference in August this year, Huang Renxun gave a demonstration in which he input to AI a 2D CAD plan of the factory plan-- this plan is just a PDF document, and then talk to the generative AI (Generative AI) with a few words and mention the requirements, and AI can output a complete 3D virtual factory-- or in today's popular words. It is called factory digital twin-including the floor material of factory warehouse, plant layout, etc., based on the model of OpenUSD 3D format.
This work, which used to require considerable manpower, material resources and time, can now be completed by AI in a very short time. Of course, the specific implementation details still need to be adjusted, but this demonstration shows us that the potential value of generative AI in industry applications has broken through the popular styles such as ChatGPT and Stable Diffusion, and of course, Nvidia, which has just joined the market capitalization trillions club, is so successful in today's AI market.
Many AI chip companies and even IP suppliers we have come into contact with this year are generally saying that the AI training market in data centers has been dominated by Nvidia, and it is difficult to shake this part of the market: for example, the computing foundation that drives cloud services such as ChatGPT and new Bing; and the training of all kinds of large models, that is, chips such as the Nvidia A100 and H100.
But for the industry as a whole, the generative AI can not all be borrowed by Nvidia: the cloud market can't beat you, so can't we engage in marginal reasoning market? So this year, enterprises at different levels from top to bottom are promoting edge, and even end-to-side generative AI.
From the broad concept of "edge", it is not only the edge of data centers and enterprise gateways, but also the end-to-side market where Intel has been promoting AI PC for more than half a year. MediaTek has released mobile phone AP SoC that can run generative AI before the end of the year, and even some enterprises engaged in embedded chips are also talking about generative AI.
But in fact, even in the marginal market, Nvidia has its own killer mace, leaving aside all kinds of Jetson chips and IGX platforms, which are more inclined to the edge of industry and enterprise applications, isn't Nvidia's GeForce graphics card on the PC side now the most widely used platform for individual users to do AI research and AI technology enthusiasts?
In fact, when the PC market began to decline after the end of the epidemic, the major market participants in the PC industry began to promote AI PC. Now it seems that generative AI will probably be the hot spot for the next wave to make the PC market rise again. The loudest talk about AI PC in the market is obviously Intel and various OEM enterprises at this stage. But in fact, it is Nvidia Bunda that has built CUDA Ecology in the whole stack of Nvidia AI, including AI Ecology, for the longest time, and probably the most qualified to say AI PC.
In September, Nvidia released the open source TensorRT-LLM, a reasoning tool dedicated to LLM large language models that accelerated the reasoning performance of LLM-although it was still focused on speeding up reasoning for H100 at the time. The TensorRT-LLM for Windows version was released in October, mainly to realize the single-card GPU reasoning on the bare metal Windows platform, with a special focus on the support for GeForce RTX 40 series graphics cards, which makes the speed of PC generation AI 4 times faster. At the same time, TensorRT acceleration has also been applied to the popular application Stable Diffusion WebU, which increases the speed of the generative AI Diffusion model by two times. This obviously means AI PC, which can be said to be the official beginning of Nvidia's AI PC campaign, even though Nvidia has never recently started doing AI on PC.
Let's take a look at what Nvidia has in terms of the concept of AI PC, especially generative AI acceleration. By the way, let's talk about whether AI PC is really valuable.
The basis of generative AI implementation of GeForce graphics card
There are still many players who want to share in the end-side AI market. For example, Intel wants to add special NPU acceleration units to the new generation of PC-oriented Meteor Lake processors, and AMD Ryzen processors now have special Ryzen AI brand concept promotion. MediaTek's Tianji 9300 mobile phone chip internal AI unit specially integrates the so-called "generative AI acceleration engine." More OEM vendors are responding, and Microsoft expects to be the standard setter for AI / ML in this game.
Obviously, after the generative AI represented by ChatGPT detonated the AI market again at the end of last year and the beginning of this year, the main market participants have great expectations for end-to-side AI. In fact, the reasons for emphasizing end-to-side or local AI reasoning are easy to understand: first, as mentioned at the beginning of the article, such good technology and hot spots cannot be unique to Nvidia, and everyone has to share the cake; second, local AI reasoning has some incomparable advantages.
These advantages should be a clich é, nothing more than the strengths and weaknesses of both sides of the cloud and the edge, which are moved to AI: including data security and privacy, latency requirements, and the inability to ensure real-time cloud connectivity. However, as far as AI is concerned, cloud AI for the masses, such as ChatGPT and Midjourney, has a huge disadvantage compared with the local deployment AI model, that is, it cannot be customized according to individual needs.
If we broaden our horizons beyond the consumer market, it is inevitable for AI to go to the edge: at least enterprises must need marginal AI to improve productivity, which is also the trend of our forecast for the development of generative AI next year. For individual users, whether for scientific research or for specific productivity, end-to-side local AI also has the advantage of customization and more freedom and flexibility. For example, the need to draw a little sister with a compound model like Stable Diffusion may be denied service on Midjourney.
As the king of AI on the cloud, what kind of reserves does Nvidia have on the end side, especially on the PC side? Most students should know that Nvidia has added Tensor Core to GeForce RTX GPU from the Turing architecture, that is, a dedicated hardware unit that can speed up AI computing. At the same time, Huang Renxun stressed the value of Transformer and the potential of LLM many times in his keynote speech at GTC in the fall of 2021.
The following year, when Nvidia released the H100 accelerator card of Hopper architecture, it decisively introduced the Transformer Engine library, coupled with a new generation of Tensor Core hardware to improve the performance of Transformer model processing several times. It seemed bold to do a more specific acceleration on GPU at that time. In October of the same year, the GeForce RTX 40 series GPU of the corresponding Ada Lovelace architecture was released, which was naturally supported by the Transformer engine-even though it was just a graphics card.
The global outbreak of Dall-E and ChatGPT was actually after Hopper and Ada Lovelace joined the Transformer engine. Of course, the popularity of big models like LLM and Stable Diffusion is still the result of the trend, but Nvidia can focus on hardware deployment before the hot spots are detonated, and then on this year's GTC, Nvidia naturally played a famous scene in which Huang Renxun handed DGX to OpenAI. Soon Nvidia's market capitalization soared by trillions. This level of foresight is indeed high.
Let's talk a little bit about the relationship between Transformer and generative AI. Transformer uses a so-called self-attention (self-attention) mechanism in structure to capture the global correlation and the relationship between different element in a queue. Transformer was originally mainly suitable for NLP (natural language processing, Natural language processing), because its self-attention mechanism can associate each element in the queue with all other element, and the model can weigh its importance based on element context.
In human terms, GPT is the abbreviation of Generative Pre-trained Transformer, based on or partly based on Transformer is very consistent with the characteristics of this model. LLM big language models are generally based on Transformer structures, such as ChatGLM, such as Llama, which have been popular in the past two years.
In addition, the original CNN convolution neural network is different from Transformer, and the former is considered to be more suitable for image classification, object recognition and so on. But then Google sent a paper, saying that if the image is cut into small pieces, each piece is treated as a word or token, then it can also learn how to identify objects with high precision, achieving good parallelism and flexibility, so that Transformer is also suitable for large-scale image recognition and CV work. The Diffusion model has an attempt based on Transformer.
There is no need to take a closer look at the relationship between Transformer and Diffusion and what their development potential is. Nvidia's previous release of the L40 and RTX 6000 GPU placed special emphasis on Stable Diffusion mapping (reasoning) performance improvements-both of which are also based on the Ada Lovelace architecture, but different from GeForce's market positioning.
So overall, Nvidia's preparation for AI PC hardware is several positions faster than its competitors-- although this seems to rely mainly on Nvidia's early success in the data center AI HPC-- of course, it ecologically covers the more than a decade-old CUDA that allows GPU to do all kinds of general-purpose computing, and its subsequent AI as part of the layout.
Tools and Ecology: speed up Local reasoning in generative AI
On the issue of AI training and reasoning, a large number of market research data show that the market for reasoning must be larger-Schneider Electric data is that, from the point of view of electricity consumption, the current ratio of AI training and reasoning power consumption worldwide is about 2 8; in the future, it will be further biased towards the reasoning side. So it is clear that Nvidia will not let go of the reasoning market.
Every time we say a word to ChatGPT, ChatGPT will do an AI reasoning (inference); every time Stable Diffusion shows a little sister, she will do an AI reasoning locally. The levels of calculation of the two are not the same. At GTC this spring, Nvidia released an H100 NVL for LLM reasoning, which focuses on reasoning that requires a lot of computing power.
As for the PC end-to-side, like the data center graphics card, everyone is based on Ampere or Ada Lovelace architecture. Based on the software stack built by Nvidia, it is natural to use GeForce RTX graphics card to do AI reasoning. And it seems that the PC industry media has taken Stable Diffusion's local reasoning into consideration in the past two years when evaluating graphics cards-mostly based on Stable Diffusion WebUI (A1111, a GUI graphical user interface that runs Stable Diffusion). Of course, the foundation of running Stable Diffusion WebUI with GeForce RTX graphics card is CUDA.
So Nvidia was the first to have the foundation of "AI PC". After all, the layout of its ecology and software stack, as well as all kinds of games invented by the community are quite early. Intel began to promote the concept of AI PC at the beginning of this year. In fact, it is much later than Nvidia. We have always said that Intel software engineers should work a lot of overtime this year, although Intel largely draws on the power of the open source community, but to build the whole stack decent, let Stable Diffusion and all kinds of LLM models run on their own CPU and GPU, it does make a lot of efforts: whether it is to run at the beginning of the year, or the optimization work in the second half of the year.
I feel that in October this year, Nvidia clearly began to pay more attention to the issue of AI PC. In October, Nvidia personally made a TensorRT extension for Stable Diffusion WebUI. What is TensorRT? RT here means runtime, so it is first of all a runtime library for AI application deployment; it is also a reasoning optimization tool-it gives API and parsers, imports the AI model, and then generates an optimized runtime runtime engine.
In other words, this TensorRT plug-in makes AI models run (reasoning) faster in Stable Diffusion WebUI. The measured data given by bilibili up master Nenly is 3 times higher than the AI reasoning performance, that is, drawing the little sister with Stable Diffusion is 3 times faster than before without the TensorRT plug-in; according to Nvidia, this plug-in runs on GeForce RTX 4090, and Stable Diffusion reasoning speed is 7 times faster than M2 Ultra (also a bully of Apple's ecology), officially allowing AI to enter the era of second speed.
Stable Diffusion TensorRT measured score by Nenly classmate
In fact, the tool TensorRT itself has been introduced for some years, and back in 2019 and earlier, the version iteration of this middleware is still a hot spot for GTC updates over the years.
TensorRT-LLM, released in September, is obviously based on TensorRT, and the suffix LLM is a large language model. The definition given on Github is that TensorRT-LLM provides users with an easy-to-use Python API to define LLM, builds a TensorRT engine with a high level of optimization, and performs efficient reasoning on Nvidia GPU (which also includes runtime components that can run the TensorRT engine).
Nvidia defines it as the backbone that puts generative AI applications into production. To put it simply, it is a tool that can accelerate and optimize LLM reasoning. Nvidia's promotional material mentions that TensorRT-LLM v0.6.0 "brings up to five times the thrust performance improvement and supports more popular LLM".
The introduction to Nvidia's website also specifically mentioned that TensorRT-LLM makes use of FastTransformer--, an optimization library developed by Nvidia for Transformer models. Judging from these components, TensorRT-LLM can also be regarded as the result of many years of experience. In the words of Nvidia on the Q3 earnings call, "We have been in the installation Foundation (installed base) for more than 20 years; any time you see Nvidia GPU, it runs our stack", including GeForce.
In October, TensorRT-LLM for Windows released clear support for GeForce RTX single-card reasoning. This thing and Stable Diffusion WebUI's TensorRT plug-in are released at the same time, no matter how you look at it, you have the intention to really do AI PC. It is mentioned in the promotion that TensorRT-LLM for Windows increases the speed of PC generative AI by four times.
Running generative AI locally in PC will also become more practical, including more popular LLM models such as Llama, which can be deployed locally by interested students. Whether to do LLM research, or to chat, write copywriting, write code, check materials, or combine with other technologies, that is what PC users and developers need to think about.
One last point in this part: when the AI software stack on the PC side is not really unified, Microsoft's AI API is also worth observing, such as DirectML. As a part of DirectX 12, DirectML is a machine learning API given by Microsoft as an operating system supplier. It now provides AI acceleration support for most chip factories' GPU, making it more versatile. For example, Stable Diffusion WebUI also has a DirectML version, but its efficiency may be slightly lower than that of the dedicated API given by the chip factory.
This time Nvidia also worked with Microsoft to optimize the Llama model running on DirectML API, and Nvidia may have done mainly the optimization at the GPU driver level. This can be regarded as a multifaceted attack of the ecological construction of AI PC.
Hug generative AI, hug AI PC
In fact, edge and end-side AI may not be limited to generative AI. Although the concept of "AI PC" was mainly proposed this year, the application of AI technology to the PC platform should have started since the introduction of the Nvidia Turing architecture (RTX 20 series graphics cards). Otherwise, what else would Tensor Core be used for after so many years?
The AI deep learning supersampling technology DLSS in PC games belongs to the typical application of AI in the game field: many pixels are not rendered by GPU graphics units, but generated by AI-DLSS 3 is starting to generate frames, and 3.5 is doing light reconstruction, which is what AI is doing.
There are also things like the recently updated RTX VSR video super-resolution, which super-divides low-resolution streaming images into high-resolution images through AI, and the new version is said to be able to eliminate artifacts and compression distortion when playing content at the original resolution. Coupled with Nvidia's AI technologies and features such as video conferencing eye gaze and picture overscoring, such technologies should be part of AI PC, even if they are not generative AI.
The arrival of generative AI will accelerate the in-depth application of AI PC to AI technology. With regard to the application of generative AI on the PC platform, it is expected that with the development of generative AI and large models themselves, we will find the answer in the year after next year. It is highly probable that there will be corresponding killer applications, depending on the developers' wild ideas.
Since ChatGPT detonated the market, one of the most discussed issues among the public seems to be that generative AI is going to seize human jobs and positions. Are we going to lose our jobs? In the face of the crisis that mechanical repetitive work may indeed face the elimination of the times, there are also many people who are willing to embrace generative AI. And they are a group of people who really use generative AI as a productivity tool to guide themselves and the future development of the times.
For example, Zhao Enzhe, the first Chinese artist to win the Hugo Award recently, based on his hand-painted design, created a work with the theme of "ship of Vanity" in Stable Diffusion with the help of GeForce RTX graphics card. Zhao Enzhe said: "from the point of view of the products of the game film industry, it is commendable to use AI and computing power to reduce costs and increase efficiency to empower development. I personally like hand-painted warships with a sense of the future, and I hope to turn these warships and the world in my heart into a blockbuster and a game, which is impossible to achieve by myself."
"now that I have the AI assistant authoring tool, it can give me the possibility of implementing various technologies based on my imagination. It used to take at least a lot of workers to achieve the present result, but now it takes months of effort to present it to me in a few seconds." This example should be quite representative. Artists turn AI and numeracy into tools to expand their imagination.
The boat of vanity
Let's go back to the topic of AI PC. In fact, from a high-dimensional system level, although there are great differences in the upper software stacks of chip generation AI of several major market competitors in the PC industry at the present stage, they all serve well-known large models in the end. It is feasible to judge the efficiency of running AI at the system level, for example, when all levels reach the highest level of optimization deployed by chip enterprises, use Stable Diffusion to generate graphs to see how long it takes for each chip and software.
We should be able to see such a comparison next year, which will be a test for several major competitors. First of all, with regard to Nvidia's years of accumulation in this field, even regardless of the level of chip architecture, software stack and ecological strength should be strong enough for GeForce RTX GPU to take the lead in this kind of competition. For the large models needed for generative AI, computing power is still a necessary resource, while ecology is the guarantee for efficient use of resources.
In the new revolution of PC industry, AI PC, which embraces generative AI, is really competitive at present.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
IT House November 28 news, November 24 Microsoft official Mall once again opened the Xbox Series X Bank of China lottery purchase activities, limited to 110units. Today, Microsoft announced the list of winners of this lottery. The actual number of participants in the lottery is 1.
Tmall [Chuanwazi flagship store] Chuanwa son spicy crispy radish 230g daily price of 13.9 yuan, issued an order to shoot 2 pieces, a 50% discount today. Add 6 yuan plus coupons and actually pay 14.85 yuan for 2 cans, equivalent to only about 7 yuan for each can: Tmall
CTOnews.com, October 6, one of the highlights of Apple's Vision Pro is the Spatial Personas feature, which displays the wearer's movements and body language on a transparent background, making it more realistic and heavier.
Thanks to CTOnews.com netizens for a wind clue delivery! CTOnews.com, November 28, Canalys today analyzed the trend of the European smartphone market in the third quarter, saying that although Apple and Samsung released new products, they were strict in the third quarter.
September 30 news, TCL Technology disclosed the operation of the first three quarters of this year, is expected to achieve operating income of 124.7 billion yuan-126.7 billion yuan, net profit of 1.9 billion yuan-2 billion yuan, with the stabilization of panel prices, profit has been at the bottom of the cycle. In the first three quarters of this year, new energy light