nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From magix <175154...@qq.com>
Subject ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type application/rss+xml
Date Fri, 02 Dec 2011 03:27:17 GMT
DEAR ALL:
      I want to crawl a xml file,but something wrong,help me find out what's
the reason pls.^,^

NUTCH VERSION: 1.4
JDK : 1.6
IDE :Eclipse 3.2


2011-12-02 11:12:24,473 WARN  parse.ParserFactory - ParserFactory:Plugin: 
org.apache.nutch.parse.feed.FeedParser mapped to contentType
application/rss+xml via parse-plugins.xml, but its plugin.xml file does not
claim to support contentType: application/rss+xml
2011-12-02 11:12:24,645 ERROR tika.TikaParser - Can't retrieve Tika parser
for mime-type application/rss+xml
2011-12-02 11:12:24,662 INFO  parse.ParseSegment - Parsing:
http://news.163.com/special/00011K6L/rss_war.xml
2011-12-02 11:12:24,664 WARN  parse.ParseSegment - Error parsing:
http://news.163.com/special/00011K6L/rss_war.xml: failed(2,0): Can't
retrieve Tika parser for mime-type application/rss+xml


I try to modify property'plugin.includes' in 'nutch-defualt.xml' to:
<value>protocol-http|urlfilter-regex|parse-(html|tika)|feed|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
(add feed support)

and Run as JavaApplication in Eclipse,but nothing change.



  

--
View this message in context: http://lucene.472066.n3.nabble.com/ERROR-tika-TikaParser-Can-t-retrieve-Tika-parser-for-mime-type-application-rss-xml-tp3553672p3553672.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Mime
View raw message