Moore Thread's first thousand-card intelligence center landed, accelerating the development of a large model from computing power to ecology. 10/26 Update SLTechnology News&Howtos

Moore Thread's first thousand-card intelligence center landed, accelerating the development of a large model from computing power to ecology.

2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

On December 19, the unveiling ceremony of Moore Thread KUAE Intelligence Center, the first national 100-calorie model training platform, was successfully held in Beijing, announcing the official landing of China's first large-scale computing cluster based on domestic full-function GPU. At the same time, Moore Thread, together with many domestic partners, initiated and established Moore Thread PES- KUAE Intelligence Alliance and Moore Thread PES- large Model Ecological Alliance, together to consolidate the domestic large model integration ecology from intelligence infrastructure to large model training and reasoning, and continue to accelerate the development of China's large model industry.

Moore Thread CEO Zhang Jianzhong brought a blockbuster release in the keynote speech, including the large model intelligence accelerator card MTT S4000, as well as the Moore thread KUAE platform, which provides strong support for the training and reasoning of hundreds of billions of parameters. He said: "the official opening of Moore Thread KUAE Intelligence Center is an important milestone in the development of the company. Moore Thread has built a smart computing product line from chips to graphics cards to clusters. Relying on the multiple computing advantages of full-function GPU, Moore Thread aims to meet the growing needs of large model training and reasoning, and with green and safe intelligent computing power, vigorously promote the landing of multimodal applications such as AIGC, digital twins, physical simulation, meta-universe and the high-quality development of thousands of industries. "

The new smart accelerator card MTT S4000: both training and push, specially built for large models.

The MTT S4000 accelerator card of Moore thread model adopts the third generation MUSA kernel and supports 48GB video memory and 768GB/s video memory bandwidth on a single card. Based on Moore thread self-developed MTLink1.0 technology, MTT S4000 can support multi-card interconnection and help accelerate the distributed computing of hundreds of billions of models. At the same time, MTT S4000 provides advanced graphics rendering capabilities, video codec capabilities and ultra-HD 8K HDR display capabilities to facilitate the landing of AI computing, graphics rendering, multimedia and other integrated application scenarios. What is particularly important is that with the help of Moore Thread self-developed MUSIFY development tool, MTT S4000 computing card can make full use of the existing CUDA software ecology to achieve zero-cost migration of CUDA code to MUSA platform.

Moore Thread KUAE Intelligence Center solution: software and hardware, out of the box

Moore Thread KUAE Computing Center solution, based on full-function GPU, is a software-hardware integrated full-stack solution, including infrastructure with KUAE computing cluster as the core, KUAE Platform cluster management platform and KUAE ModelStudio model services, which aims to solve large-scale GPU computing construction and operation management problems in an integrated delivery way. This scheme can be used out of the box, greatly reduce the time cost of traditional computing power construction, application development and operation and maintenance operation platform, and quickly put into the market to carry out commercial operation.

Infrastructure: includes KUAE computing cluster, RDMA network and distributed storage. The released Moore thread KUAE kcal model training platform takes only 30 days to build, supports pre-training, fine-tuning and reasoning of hundreds of billions of parameter models, and can achieve a performance expansion factor of up to 91%. Based on MTT S4000 and dual 8-card GPU server MCCX D800, Moore thread KUAE cluster supports seamless expansion from single-machine multi-card to multi-machine multi-card, from single-card to thousand-card cluster. In the future, a larger cluster will be launched to meet the needs of large-scale model training.

KUAE Platform cluster management platform: an integrated software and hardware platform for AI large model training, distributed graphics rendering, streaming media processing and scientific computing. It deeply integrates full-function GPU computing, network and storage, and provides highly reliable and high computing services. Through this platform, users can flexibly manage multi-data centers and multi-cluster computing resources, integrate multi-dimensional operation and maintenance monitoring, alarm and log systems, and help intelligent computing centers realize operation and maintenance automation.

KUAE ModelStudio model service: covers the whole process of large model pre-training, fine-tuning and reasoning, and supports all mainstream open source large models. Through the Moore threading MUSIFY development tool, you can easily reuse the CUDA application ecology, and the built-in containerization solution can achieve one-click deployment of API. The platform is intended to provide life cycle management of large models. Through a simple and easy-to-operate interface, users can organize workflows on demand and greatly reduce the threshold for the use of large models.

Moore thread KUAE kcal cluster: multiple advantages, help large model efficient training

Distributed parallel computing is the key means to realize the training of AI large model. Moore thread KUAE supports mainstream distributed frameworks in the industry, including DeepSpeed, Megatron-DeepSpeed, Colossal-AI, FlagScale, and integrates a variety of parallel algorithm strategies, including data parallelism, tensor parallelism, pipelined parallelism and ZeRO, and makes additional optimizations for efficient communication computing parallelism and Flash Attention.

At present, Moore thread supports training and fine-tuning of all kinds of mainstream models, including LLaMA, GLM, Aquila, Baichuan, GPT, Bloom, Yuyan and so on. Based on Moore thread KUAE kcal cluster, large model training with 70B to 130B parameters, the linear speedup can reach 91%, and the arithmetic utilization remains basically unchanged. Taking the 200 billion training data as an example, Zhiyuan Institute 70 billion parameter Aquila2 can complete the training in 33 days, and the model with 130 billion parameter scale can complete the training in 56 days. In addition, Moore thread KUAE kcal cluster supports continuous and stable operation for a long time, supports continuous training at breakpoints, and asynchronous Checkpoint is less than 2 minutes.

With the comprehensive advantages of high compatibility, high stability, high scalability and high computing utilization, Moore thread KUAE kcal computing cluster will become a solid and reliable advanced infrastructure for large model training.

Intelligent calculation and large model ecological alliance: multi-party cooperation to promote ecological integration

In the era of large model, intelligent computing represented by GPU is the cornerstone and the center of the generative AI world. Moore Thread teamed up with China Mobile Beijing, China Telecom Beijing Branch, Lenovo, Century Interconnection, Sinnet, Zoomlion data, several Zhishu, China Development Zhiyuan, Enterprise online, Nortel Digital economy Computing Center, Ziguang Hengyue, Ruihua Industrial Holdings (Shandong), Sail Network, Zhongke Jincai, Zhongyun Zhishu, Jinzhou Holdings (ranking first and last) and other enterprises. Jointly announce the establishment of the "Moore Thread PES-KUAE Intelligence Alliance". The alliance will vigorously build and promote a national production intelligence platform from underlying hardware to software, tools and applications, in order to achieve high utilization of clusters, and become the first choice for large model training with easy-to-use and easy-to-use full-stack intelligence solutions.

At the event site, Moore Thread signed an on-site contract with Zoomlion data and several Intelligence, and jointly unveiled the Moore Thread KUAE Intelligence Center. More than 200 guests at the scene witnessed this important moment.

Ecology is the key to the breakthrough in the application of artificial intelligence. To this end, Moore Thread joined hands with a number of large model ecological partners, such as 360Fei Propeller, JD.com, Smart AI, supersymmetry, WWW Core Dome, Tsinghua University, Fudan University, Zhejiang University, Beijing Institute of Technology, Ling Yunguang, Ruilai Wisdom, Nanwei Software (regardless of ranking first and last) and other large model ecological partners to initiate and establish the "Moore Thread PES-Big Model Ecological Alliance". Moore Thread will take MUSA as the center of the software and hardware integrated large model solution, actively carry out compatibility adaptation and technology tuning with a wide range of ecological partners, and jointly promote the overall prosperity of the domestic large model ecology.

In the final round-table conversation, Dong Longfei, vice president of Moore Thread, and other important guests, such as Qiehu, chairman of Zhongneng Construction Green Digital Technology (Zhongwei) Co., Ltd., Zhang Peng, AI CEO, Pei Jiquan, chief AI scientist of JD.com, Zhai win, managing director of CICC Capital, Wu Hengkui, supersymmetric founder, Zhen Jian, chairman of several Zhiji, etc. This paper makes an in-depth discussion on the computing power requirements of the current large model and the construction and operation of the intelligent computing center. Participants agreed that the intelligent computing center should not only accumulate hardware, but also test the integration ability of the software and hardware integrated GPU intelligent computing system, and the adaptation of the GPU distributed computing system, the management of computing power clusters and the application of efficient reasoning engines are all important factors to improve the usability of the computing center. The development of the domestic intelligent computing center depends on the full integration of the needs and advantages of all parties, so that the industrial cohesion can achieve the coordination of the whole ecology and promote the development of the domestic cause.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.