In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2024-12-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)12/24 Report--
With the wide application of deep learning model in natural language processing and other fields, the reasoning speed and performance of the model have become the key issues. Recently, the research result "SAMP: post-training quantitative Model inference Base based on Adaptive mixed accuracy" led by Kuaishou was successfully selected into the top conference EMNLP 2023 in this field and was presented and shared in Singapore.
In this study, a reasoning acceleration tool called SAMP is proposed, which uses adaptive hybrid precision technology to significantly improve the reasoning speed while maintaining the performance of the model. It includes adaptive hybrid precision encoder and a series of advanced fusion strategies. The adaptive hybrid precision encoder can find the best floating-point fixed-point hybrid precision combination in a large number of general matrix multiplication (GEMM) operations and Transformer layer, so that the performance of model reasoning is closest to the needs of users (computational accuracy or reasoning efficiency). Finally, the hybrid accuracy calculation is better than the full fixed-point calculation. The fusion strategy improves the fusion of embedding operator and quantization-related computing operations, so that the calls to CUDA kernel are reduced by half. At the same time, SAMP is an end-to-end toolkit implemented by C++ programming language, which has excellent reasoning speed and lowers the threshold of industrial application of quantitative reasoning after training.
Table 1:SAMP 's innovations compared with similar systems
SAMP has the following main highlights:
1. Adaptive. SAMP balances computational accuracy and delay performance in post-training quantitative reasoning. Users can choose the appropriate mixed precision configuration of accuracy and reasoning delay for different tasks. SAMP can also recommend the best quantitative combination mode to users through adaptive allocation method.
two。 Reasoning efficiency. In a wide range of precision (floating point to fixed point), SAMP shows better reasoning acceleration than other reasoning toolkits. In the Chinese language comprehension benchmark (CLUE) classification task dataset, compared with FasterTransformer, SAMP achieves up to 1.05-1.15 times acceleration.
3. Flexibility. SAMP covers many downstream tasks, such as classification, sequence marking, text matching, and so on. The Target module is extensible and customizable. It is user-friendly and less platform-dependent. SAMP supports C++ and Python API, and only CUDA 11.0 or later is required. In addition, SAMP also provides many model transformation tools to support the conversion of different format models to each other.
Figure 1: this paper is presented and shared on EMNLP2023.
Tian Rong, the main researcher from Kuaishou, said that achieving good results in scenarios such as model reasoning is the result of the joint efforts of the whole team. SAMP's contribution is mainly in three aspects: first, it solves the problem of large loss of accuracy of existing PTQ reasoning tools in industrial applications; second, it promotes the large-scale use of post-quantization (PTQ) technology in many downstream tasks of NLP. At the same time, the inference library is lightweight, flexible, user-friendly and supports user-defined task goals.
It is reported that EMNLP (Empirical Methods in Natural Language Processing) is one of the top international conferences in the field of natural language processing and artificial intelligence, focusing on the academic research of natural language processing technology in various application scenarios, especially the empirical research of natural language processing. The conference has promoted the core innovations in the field of natural language processing, such as pre-training language model, text mining, dialogue system, machine translation and so on, and has a great influence in academic and industrial circles. This selection also means that Kuaishou's research achievements in this field have been recognized by international scholars.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.