DEAR ALL:
I want to crawl a xml file,but something wrong,help me find out what's
the reason pls.^,^
NUTCH VERSION: 1.4
JDK : 1.6
IDE :Eclipse 3.2
2011-12-02 11:12:24,473 WARN parse.ParserFactory - ParserFactory:Plugin:
org.apache.nutch.parse.feed.FeedParser mapped to contentType
application/rss+xml via parse-plugins.xml, but its plugin.xml file does not
claim to support contentType: application/rss+xml
2011-12-02 11:12:24,645 ERROR tika.TikaParser - Can't retrieve Tika parser
for mime-type application/rss+xml
2011-12-02 11:12:24,662 INFO parse.ParseSegment - Parsing:
http://news.163.com/special/00011K6L/rss_war.xml
2011-12-02 11:12:24,664 WARN parse.ParseSegment - Error parsing:
http://news.163.com/special/00011K6L/rss_war.xml: failed(2,0): Can't
retrieve Tika parser for mime-type application/rss+xml
I try to modify property'plugin.includes' in 'nutch-defualt.xml' to:
<value>protocol-http|urlfilter-regex|parse-(html|tika)|feed|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
(add feed support)
and Run as JavaApplication in Eclipse,but nothing change.
--
View this message in context: http://lucene.472066.n3.nabble.com/ERROR-tika-TikaParser-Can-t-retrieve-Tika-parser-for-mime-type-application-rss-xml-tp3553672p3553672.html
Sent from the Nutch - User mailing list archive at Nabble.com.
|