In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
How to use Spark to analyze website logs, I believe that many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Depressed from yesterday, the personal website constantly issued an alarm 504 error. After logging in to the machine, the php-fpm reported an error. After this error restarted php-fpm, the alarm was given in a few hours. It was no problem for almost a year, strange.
[28-Sep-2016 11:53:19] NOTICE: ready to handle connections
[28-Sep-2016 11:53:19] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 11:53:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:46:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:49:32] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
Thought that this value was set too small, so the configuration was modified and the value was changed to a large value.
[28-Sep-2016 15:51:43] NOTICE: fpm is running, pid 28179
[28-Sep-2016 15:51:43] NOTICE: ready to handle connections
[28-Sep-2016 15:51:43] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 15:52:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children
[28-Sep-2016 16:15:58] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:52:32] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:53:05] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:55:17] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
The result is still the same later, a few hours later again 504 alarm, and then look at the nginx log, found that some strange ip traffic is very large. It is suspected that there are malicious ip visits. It seems necessary to check the number of ip visits in the access log.
Root@iZ28bhfjhgkZ:/var/log/nginx# vim access.log
121.42.53.180-- [25/Sep/2016:06:26:29 + 0800] "POST / wp-cron.php?doing_wp_cron=1474755989.0131719112396240234375 HTTP/1.0" 499 0 "-" WordPress/4.3.1; http://zhwen.org"
182.92.148.207-[25/Sep/2016:06:26:29 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
203.208.60.226-- [25/Sep/2016:06:28:55 + 0800] "GET /? pinch 675 HTTP/1.1" 200 8204 "-" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)")
203.208.60.226-[25/Sep/2016:06:28:57 + 0800] "GET / wp-content/themes/sparkling/inc/css/font-awesome.min.css?ver=4.3.1 HTTP/1.1" 200 26711 "http://zhwen.org/?p=675"" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
203.208.60.226-[25/Sep/2016:06:28:57 + 0800] "GET / wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 HTTP/1.1" 200374 "http://zhwen.org/?p=675"" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
203.208.60.226-[25/Sep/2016:06:28:58 + 0800] "GET / wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.3.1 HTTP/1.1" 200771 "http://zhwen.org/?p=675"" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
121.43.107.174-[25/Sep/2016:06:29:18 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
115.28.189.208-[25/Sep/2016:06:29:33 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
42.156.139.59-[25/Sep/2016:06:30:58 + 0800] "GET /? paged=14 HTTP/1.1" 11164 "-" YisouSpider "
182.92.148.207-[25/Sep/2016:06:31:29 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
61.135.169.81-[25/Sep/2016:06:34:14 + 0800] "GET /? p=articles/cscope-tags HTTP/1.1" 10681 "-" Mozilla/5.0 (Macintosh; Intel Mac OS X 1012) AppleWebKit/602.1.50 (KHTML, like Gecko) "
61.135.169.81-[25/Sep/2016:06:34:14 + 0800] "GET / apple-touch-icon-precomposed.png HTTP/1.1" 404151 "-" Safari/12602.1.50.0.10 CFNetwork/807.0.4 Darwin/16.0.0 (x86 / 64) "
So a simple statistic is made on the ip of the access log:
1) first take out the ip (in order to reduce the amount of data, it can also be directly compressed and downloaded to the local), and then downloaded to the local
Root@iZ28bhfjhgkZ:/var/log/nginx# cat access.log | awk'{print $1}'> tt
Execute the following code in sparkshell:
Val line = sc.textFile ("/ data1/data/t1")
Line.flatMap (_ .split (")) .map ((_, 1)) .reduceByKey (_ + _)
.map (e = > (e. ReduceByKey 2, e. Map 1). Map (_ + "," + _)
.sortByKey (true,1) .saveAsTextFile ("/ data1/data/t3")
2) the content of the final result T3 is as follows, it is found that the traffic of these ip is very large, especially
191.96.249.53
.
(855182.92.148.207)
(3100121.8.136.75)
(3889pm 61.135.169.81)
(53513191.96.249.53)
3) set up another iptables restriction and get it done. Spark to do this kind of statistical analysis is very simple, just one line of code to get the analysis done.
Root@iZ28bhfjhgkZ:/var/log# iptables-L
Chain INPUT (policy ACCEPT)
Target prot opt source destination
Chain FORWARD (policy ACCEPT)
Target prot opt source destination
Chain OUTPUT (policy ACCEPT)
Target prot opt source destination
Root@iZ28bhfjhgkZ:/var/log# iptables-An INPUT-s 191.96.249.53-j DROP
Root@iZ28bhfjhgkZ:/var/log# iptables-L
Chain INPUT (policy ACCEPT)
Target prot opt source destination
DROP all-DEDICATED.SERVER anywhere
Chain FORWARD (policy ACCEPT)
Target prot opt source destination
Chain OUTPUT (policy ACCEPT)
Target prot opt source destination
Root@iZ28bhfjhgkZ:/var/log# has read the above content, have you mastered how to use Spark to analyze the website log? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.