In addition to Weibo, there is also WeChat
Please pay attention

WeChat public account
Shulou
2025-11-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the Nutch error report how to do, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
Indexer: java.io.IOException: Job failed!
Environment: nutch2.8 local mode solr service is normal
ParseSegment: finished at 2014-07-14 21:21:19, elapsed: 00:00:35CrawlDB updateCrawlDb update: starting at 2014-07-14 21:21:21CrawlDb update: db: crawl/crawldbCrawlDb update: segments: [crawl/segments/20140714190910] CrawlDb update: additions allowed: trueCrawlDb update: URL normalizing: falseCrawlDb update: URL filtering: falseCrawlDb update: 404purging: falseCrawlDb update: Merging segment data into db.CrawlDb update: finished at 2014-07-14 21:21:30 Elapsed: 00:00:09Link inversionLinkDb: starting at 2014-07-14 21:21:33LinkDb: linkdb: crawl/linkdbLinkDb: URL normalize: trueLinkDb: URL filter: trueLinkDb: internal links will be ignored.LinkDb: adding segment: crawl/segments/20140714190910LinkDb: merging with existing linkdb: crawl/linkdbLinkDb: finished at 2014-07-14 21:21:42 Elapsed: 00:00:09Dedup on crawldbIndexing 20140714190910 on SOLR index- > http://192.168.122.104:8080/solrIndexer: starting at 2014-07-14 21:21:55Indexer: deleting gone documents: falseIndexer: URL filtering: falseIndexer: URL normalizing: falseActive IndexWriters: SOLRIndexWriter solr.server.url: URL of the SOLR instance (mandatory) solr.commit.size: buffer size when sending to SOLR (default 1000) solr.mapping.file: name of the mapping file for fields (default solrindex- Mapping.xml) solr.auth: use authentication (default false) solr.auth.username: use authentication (default false) solr.auth: username for authentication solr.auth.password: password for authenticationIndexer: java.io.IOException: Job failed! At org.apache.hadoop.mapred.JobClient.runJob (JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index (IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run (IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main (IndexingJob.java:186)
Solution:
1. Check the nutch log and find that it is caused by Bad Request when nutch requests solr.
2. Check the solr log (logging on the web page of solr) and find org.apache.solr.common.SolrException: ERROR: [doc= http://18.ifeng.com/] unknown field 'anchor'
3. You can see from 2 that the field called 'anchor'' is missing, so add the 'anchor' field to the solr/collection1/conf/schema.xml file of solr.
No agents listed in 'http.agent.name' property
Reason: the http.agent.name value is not set in $NUTCH_HOME/conf/nutch-site.xml. This error will be reported if this value is empty in the newer version of nutch (TODO: exact version to be verified).
Solution: this value is actually the User-Agent value, you can fill in the browser UA value to achieve the purpose of masquerading browser access. Be careful! After filling in, you need to use ant to recompile to take effect.
Thank you for reading this article carefully. I hope the article "how to report mistakes in Nutch" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope





About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.