Moore thread large model intelligence accelerator card MTT S4000 released, equipped with 48GB display memory 08/17 Update SLTechnology News&Howtos

Moore thread large model intelligence accelerator card MTT S4000 released, equipped with 48GB display memory

2025-08-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

CTOnews.com December 19 news, Moore Thread announced today that the unveiling ceremony of Moore Thread KUAE Intelligence Center, the first nationally produced 100 billion model training platform, was successfully held in Beijing, announcing the official landing of China's first large-scale computing cluster based on domestic full-function GPU, and the release of large model intelligence accelerator card MTT S4000 at the same time.

The parameters of MTT S4000 attached to CTOnews.com are as follows:

The MTT S4000 accelerator card of Moore thread model adopts the third generation MUSA kernel and supports 48GB video memory and 768GB/s video memory bandwidth on a single card. Based on Moore thread self-developed MTLink1.0 technology, MTT S4000 can support multi-card interconnection and help accelerate the distributed computing of hundreds of billions of models. At the same time, MTT S4000 provides advanced graphics rendering capabilities, video codec capabilities and ultra-HD 8K HDR display capabilities to facilitate the landing of AI computing, graphics rendering, multimedia and other integrated application scenarios. What is particularly important is that with the help of Moore Thread self-developed MUSIFY development tool, MTT S4000 computing card can make full use of the existing CUDA software ecology to achieve zero-cost migration of CUDA code to MUSA platform.

Officials said that the Moore Thread KUAE Computing Center solution, which is based on full-function GPU, is a software-hardware integrated full-stack solution, including infrastructure with KUAE computing cluster as the core, KUAE Platform cluster management platform and KUAE ModelStudio model services, which aims to solve large-scale GPU computing construction and operation management problems in an integrated delivery way. This scheme can be used out of the box, greatly reduce the time cost of traditional computing power construction, application development and operation and maintenance operation platform, and quickly put into the market to carry out commercial operation.

Moore thread KUAE supports mainstream distributed frameworks in the industry, including DeepSpeed, Megatron-DeepSpeed, Colossal-AI, FlagScale, and integrates a variety of parallel algorithm strategies, including data parallelism, tensor parallelism, pipelined parallelism and ZeRO, and makes additional optimizations for efficient communication computing parallelism and Flash Attention. At present, Moore thread supports training and fine-tuning of all kinds of mainstream models, including LLaMA, GLM, Aquila, Baichuan, GPT, Bloom, Yuyan and so on. Based on Moore thread KUAE kcal cluster, large model training with 70B to 130B parameters, the linear speedup can reach 91%, and the arithmetic utilization remains basically unchanged. Taking the 200 billion training data as an example, Zhiyuan Institute 70 billion parameter Aquila2 can complete the training in 33 days, and the model with 130 billion parameter scale can complete the training in 56 days. In addition, Moore thread KUAE kcal cluster supports continuous and stable operation for a long time, supports continuous training at breakpoints, and asynchronous Checkpoint is less than 2 minutes.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.