In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Big data calculates BUG processing:
Resources before program modification:
Driver: 1
Worker: 2 sets
Program submit application for memory resources: 1 GB of memory
Memory allocation:
1. 20% for program running
2. 20% for Shuffle
3. 60% for RDD caching
Single TweetBean size: 3k
1. Memory overflow
Reason: because the program queries all the TweetBean and union them, the operation is done in memory. Then when a campaign has a large amount of data, such as 500W data, then 500W*10k=50G exceeds the memory limit.
Solution: first split the task according to the amount of data to avoid memory overflow caused by a lot of data in a single task. Put all the task sharding completion in the task list. Loop the task list, when the amount of data fetched from the task is greater than 200000, merge all the data and split it into 16 RDD fragments. Loop through the task list until the end.
The reason for fetching 200000 data in batches: 200000*3k=600M, the memory available for the two machines to run the program = 2 (number of machines) * 2G (memory requested by the program) * 0.2 (the proportion of memory used for program running) = 800m, which can be used to store 200000 data and avoid memory overflow.
two。 Running slowly
Reason: because of the two machines, the amount of memory available for shuffle per machine = 2 (number of machines) * 1G (memory requested by the program) * 0.2 (ratio of memory used to run the program) = 400m.
200000 (amount of data processed in a batch) * 3k (size of a single TweetBean) = 600m. The amount of data in a batch Shuffle is larger than the available memory of the machine, so the data will be Flush to the hard disk, resulting in slow data reading.
Solution: adjust the available memory of the program Shuffle, as follows:
Program applies for memory resources: 2G
Memory allocation:
1. 20% for program running
2. 60% for Shuffle
3. 20% for RDD caching
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.