In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
In this article, the editor introduces in detail "how to use node to grab novel chapters". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "how to use node to grab novel chapters" can help you solve your doubts.
If you are going to use electron to make a novel reading tool to practice your hands, then the first thing to solve is the data problem, that is, the text of the novel.
Here we are going to use nodejs to crawl the novel website. If we try to climb the next novel, the data will not be stored in the database. Use txt as the text storage first.
For requests for websites in node, there are http and https libraries and request request methods inside.
Example:
Request = https.request (TestUrl, {encoding:'utf-8'}, (res) = > {let chunks =''res.on (' data', (chunk) = > {chunks + = chunk}) res.on ('end',function () {console.log (' request end');})})
But that's it, just accessing a html text data, and not being able to extract internal elements (it can also be taken regularly, but it's too complex).
I store the accessed data through the fs.writeFile method, which is just the html of the whole web page.
But what I want is also the content of each chapter, so I need to get the hyperlink of the chapter and form a hyperlink list to crawl it.
Cheerio library
In the documentation, you can use examples for debugging
Parsing HTML using cheerio
When cheerio parses html, the dom node is obtained in a similar way to jquery.
According to the html of the front page of the book, find the dom node data you want.
Const fs = require ('fs') const cheerio = require (' cheerio'); / / introduce read method const {getFile, writeFun} = require ('. / requestNovel') let hasIndexPromise = getFile ('. / hasGetfile/index.html'); let bookArray = []; hasIndexPromise.then ((res) = > {let htmlstr = res; let $= cheerio.load (htmlstr)) Map ((index, item) = > {let name = $(item) .text (), href = 'https://www.shuquge.com/txt/147032/' + $(item) .attr (' href') if (index > 11) {bookArray.push ({name, href})}) / / console.log (bookArray) writeFun ('. / hasGetfile/hrefList.txt') JSON.stringify (bookArray),'w')})
Print the message.
You can store this information at the same time.
Now that you have the number of chapters and the links to them, you can get the contents of the chapters.
Because batch crawling finally requires an IP agent, it is not ready to write a method to get the content of a chapter of the novel for the time being.
Crawling the content of a chapter is actually relatively simple:
/ / crawl the content method of a chapter function getOneChapter (n) {return new Promise ((resolve, reject) = > {if (n > = bookArray.length) {reject ('not found')} let name = bookArray [n] .name Request = https.request (bookArray [n] .href, {encoding:'gbk'}, (res) = > {let html =''res.on (' data', chunk= > {html + = chunk;}) res.on ('end', () = > {let $= cheerio.load (html)) Let content = $("# content"). Text (); if (content) {/ / write as txt writeFun (`. / hasGetfile/$ {name} .txt`, content,'w') resolve (content) } else {reject ('not found')})}) request.end ();})} getOneChapter (10)
In this way, you can create a calling interface according to the above method, pass in different chapter parameters, and get the data of the current chapter.
Const express = require ('express'); const IO = express (); const {getAllChapter, getOneChapter} = require ('. / readIndex') / / get chapter hyperlink list getAllChapter (); IO.use ('/ book',function (req, res) {/ / Parameter let query = req.query; if (query.n) {/ / get a chapter data let promise = getOneChapter (parseInt (query.n-1)) Promise.then ((d) = > {res.json ({d: d})}, (d) = > {res.json ({d: d})})} else {res.json ({d: 404})}) / / Digital IO.listen of the local host of the server. ;}) after reading this, the article "how to grab novel chapters with node" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.