In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to use Nutch". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to use Nutch.
Nutch has now reached version 2.2.2, and version 1.x has been updated to 1.8. here, for example, the API of some command-line tools in 1.8has changed, so it is not easy to get started.
# transfer to run Nutch#
Download and install Nutch
Under ${NUTCH_HOME}, mkdir urls
Cd urls
Touch seed.txt
Edit seed.txt, write: http://nutch.apache.org
Edit ${NUTCH_HOME} / conf/regex.urlfilter.txt
Replace
'# accept anything else+.
With
+ ^ http://([a-z0-9]*\.)*nutch.apache.org/
Crawl web page: bin/nutch crawl urls-dir crawl-depth 3-topN 5 Note: the 1.8 version of this command has changed
# install Solr#
Download and install Solr, the latest version of 4.8when I used it.
Cd ${SOLR_HOME} / example
Java-jar start.jar
Verify installation: http://localhost:8983/solr/
# Nutch and Solr integration # Note here: there are only two things that the document says:
Replace ${SOLR_HOME} / example/solr/collection1/conf/schema.xml with ${NUTCH_HOME} / conf/schema-solr4.xml and rename schema-solr4.xml to schema.xml
Add after line 351 in schema.xml (in fact, just add it in the types tag):
The integration is done here. Now all you have to do is restart solr and index the data crawled by Nutch to solr with the following command.
Under ${NUTCH_HOME}, run:
Bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb-linkdb crawl/linkdb crawl/segments/*
Then visit: http://localhost:8983/solr/
Thank you for your reading, the above is the content of "how to use Nutch", after the study of this article, I believe you have a deeper understanding of how to use Nutch, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.