In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "the basic principles and processes of python crawler". The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian and go deep into it slowly to study and learn "the basic principles and processes of python crawler" together!
1. Basic principles
Crawler is a program that simulates the user's operation on the browser or App application and automates the operation process. There are four basic processes.
(1) Initiate a request
Send a request to the target site through the HTTP library, that is, send a Request. The request can contain additional header information and wait for the server to respond.
(2) Obtain response content
If the server can respond normally, it will get a Response. The content of the Response is the page content to be obtained. The type may be HTML,Json string, binary data (image or video), etc.
(3) Analysis of content
The content obtained may be HTML, can be parsed with regular expressions, page parsing library, may be Json, can be directly converted to Json object parsing, may be binary data, can be saved or further processed
(4) Preservation of data
Save in a variety of forms, can be saved as text, can also be saved to the database, or save a specific format file
2. Process
And what happens in the background when we type a url into the browser and press enter?
In short, this process takes place in four steps:
(1) Find the IP address corresponding to the domain name.
DNS(Domain Name System) is the first thing that browsers access. DNS's main job is to convert domain names into corresponding IP addresses.
(2) Send a request to the server corresponding to IP.
(3) The server responds to the request and sends back the web page content.
(4) The browser displays the content of the web page.
Web crawler to do, simply put, is to achieve the functions of the browser. By specifying the url, directly return the data required by the user, without the need to manually manipulate the browser step by step.
Thank you for your reading, the above is the content of "the basic principles and process of python crawler", after learning this article, I believe that everyone has a deeper understanding of the basic principles and process of python crawler, and the specific use situation needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
Author: Tracy Wechat: Tracy19890201
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.