Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to modify referer in Python crawler to bypass login and access frequency restrictions

2025-05-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces how to modify referer in Python crawler to bypass login and access frequency restrictions, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Viewers should encounter the following problems when writing crawlers:

When your crawler program is developed, it can crawl web pages normally, but when you crawl a large number of pages, the crawled websites always return 403 or 500, etc.

You need to log in to the website you crawled, and it takes a lot of time to study the login process.

When we encounter problem 1, our first reaction is to reach the other party's access frequency limit, IP is blocked by the other party, and then find more IP and reduce the visit frequency.

Encounter problem 2, study the other party's encryption method, or use the human flesh to log in and save the cookie on the machine, which takes several days.

In addition to the above direct attack methods, there is another ingenious way to bypass the above two problems, that is, modify the referer in http header to achieve. Note that this is to modify the referer, not the user-agent.

I have introduced http header and how to use chrome browser to view header information in Google chrome browser and the principle of web crawler. If you don't know much about it, you can review this part of knowledge again. Here is just a simple science about what referer is.

Referer tells the target server (the website you visit) where you clicked to the current page.

For example, if you search for a website on Baidu, and then click to enter the website, you can observe that referer is similar to the following style through the package grab tool:

When you encounter two questions of appeal, you can try to change the referer to the screenshot above, which is clicked into from the search engine, and you will find that some websites will not block the IP from the search engine or give these IP access frequency more relaxed. Even some website content is supposed to log in to see it, but if you change referer to come from Baidu, you will find that you can see it without logging in.

As a matter of fact, I can finish it in one sentence. I wrote such a big piece, er.

Why do these websites favor one over the other?

It is the reason why some websites want to get SEO traffic, so they put the access control from the search engine click into more relaxed. So when you encounter the above two problems, first change referer to try, this will save you a lot of research time. This applies when some crawlers are temporary tasks or disposable crawlers, without long-term maintenance, and you can quickly write and capture the data. This is not every website has this kind of treatment, there are some professional social networking sites, some industrial and commercial information inquiry sites, some entertainment ticketing websites, and so on.

About how to modify referer in Python crawler to bypass login and access frequency restrictions to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report