In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces the python crawler how to set up each agent ip, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
How the python crawler sets the ip of each agent:
1. Add a piece of code, set up the proxy, and change the proxy at regular intervals.
By default, urllib2 uses the environment variable http_proxy to set HTTP Proxy. If a website detects the number of visits to an IP during a certain period of time, if you visit too many times, it will prohibit your access. So you can set up some proxy servers to help you do your work, every once in a while to change an agent, the website Jun does not know who is messing around, this is sour! The following code illustrates the use of proxy settings.
Import urllib2enable_proxy = Trueproxy_handler = urllib2.ProxyHandler ({"http": 'http://some-proxy.com:8080'})null_proxy_handler = urllib2.ProxyHandler ({}) if enable_proxy: opener = urllib2.build_opener (proxy_handler) else: opener = urllib2.build_opener (null_proxy_handler) urllib2.install_opener (opener)
2.Timeout setting can solve the problem caused by the slow response of some websites.
The urlopen method has been mentioned before, and the third parameter is the setting of timeout, which can set how long to wait for the timeout, in order to solve the impact caused by the slow response of some websites. For example, in the following code, if the second parameter data is empty, specify how much timeout it is, specify the formal parameter, and do not declare it if data has been passed in.
Import urllib2response = urllib2.urlopen ('http://www.baidu.com', timeout=10) import urllib2response = urllib2.urlopen (' http://www.baidu.com',data, 10) Thank you for reading this article carefully. I hope the article "how to set up each proxy ip for python crawler" shared by the editor will be helpful to everyone. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.