Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account


The fourth paradigm and the joint research results of Nanyang Technology were shortlisted in SIGMOD 2024.

2024-06-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >


Shulou( Report--

Recently, the latest joint research result of the fourth paradigm and Shuhao Zhang, a professor at Nanyang University of Technology in Singapore (streaming window connection for active error compensation in disordered data streams, paper title PECJ: Stream Window Join on Disorder Data Streams with Proactive Error Compensation), was admitted as a regular research paper by the International Top Database academic Conference SIGMOD 2024 (ACM SIGMOD / PODS International Conference on Management of Data 2024). SIGMOD is the top conference in the database field, and it can be called the "Olympics" in the database field. The selected papers represent the highest level in the database field.

Stream window connection (Stream Window Join,SWJ) is the operation of connecting two input streams in different finite subsets or windows, and is a key part of data flow analysis. Unlike traditional relational join operations, SWJ can generate connection results in real time without waiting for complete input data. This kind of operation plays an important role in the real-time field, and it is widely used in flow computing scenarios such as financial markets, fraud detection systems and sensor networks.

One of the challenges faced by SWJ is the disorderly arrival of data due to factors such as network delay. This phenomenon is called data flow concussion. In traditional methods, dealing with these unordered data streams usually involves buffering input data to provide a more comprehensive view of in-window data, thus running SWJ directly on potentially unordered data streams. However, because of its nonlinear characteristics, the extra buffer time usually leads to a large amount of delay cost.

The joint team proposed a novel solution: active error compensation (PECJ), which aims to actively manage unordered data streams. Different from the existing methods which only rely on the arrived data (that is, the data in the window), PECJ uses the unordered data to predict the future to improve the accuracy of Join. This innovative approach to dealing with unordered data can improve accuracy without increasing latency.

Figure 1: active error compensation (PECJ) algorithm architecture

When the fourth paradigm uses AI to solve the actual business problems of enterprises, it is found that in the scenarios with high timeliness and high accuracy requirements, such as financial anti-fraud, when the timeliness of data flow is affected by network delay and data source inconsistency, the required data can not be transmitted in time, which will greatly affect the timeliness and accuracy of the risk control system. Take the online anomaly detection system deployed in the data centre of the stock exchange as an example to consider an overseas transaction that could be used for malicious short trading, ideally with a delay as low as 200 milliseconds. However, due to the unpredictable effects of data flow shocks, the deal could experience delays of up to 800 milliseconds or more. There are two traditional processing methods, one is to give up the limitation to ensure accuracy, waiting for delayed data; the other is to ensure the accuracy of the aging house, using incomplete data for processing, but may lead to lower accuracy. In the application of high-risk financial environment, these two options are not satisfactory.

In contrast, PECJ takes the initiative to respond through predictive analysis. Specifically, PECJ uses variational inference (variation inference, VI) to estimate the posterior distribution (posterior distribution approximation, PDA) of unobserved data, and uses predictive data to improve the judgment accuracy of the system without significantly increasing the system delay, achieving a balance between computational efficiency and accuracy, so that the system can operate effectively in a highly delay-sensitive financial environment. This time, the joint team further integrated PECJ into the multithreaded SWJ benchmark platform (AllianceDB). In some real data sets (Stock), PECJ reduced the error rate from 47% to 1% with the same latency.

Figure 2: latency improvement and error rate reduction of PECJ under benchmark platform (AllianceDB)

In the future, the fourth paradigm machine learning open source database project OpenMLDB embedded with PECJ algorithm will be gradually applied to more industry business scenarios with high concurrency and high throughput to further improve the efficiency and reliability of streaming data processing.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information


© 2024 SLNews company. All rights reserved.