nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type application/rss+xml
Date Fri, 02 Dec 2011 10:48:57 GMT
Hi Magix,

There is a Jira open for this [1]. It would be great if you could include
your feedback on this issue and we can possibly work towards a fix.

I have not been working with ffeds recently so I am sorry that I can't
offer much more help.

[1] https://issues.apache.org/jira/browse/NUTCH-1053

On Fri, Dec 2, 2011 at 3:27 AM, magix <175154910@qq.com> wrote:

> DEAR ALL:
>      I want to crawl a xml file,but something wrong,help me find out what's
> the reason pls.^,^
>
> NUTCH VERSION: 1.4
> JDK : 1.6
> IDE :Eclipse 3.2
>
>
> 2011-12-02 11:12:24,473 WARN  parse.ParserFactory - ParserFactory:Plugin:
> org.apache.nutch.parse.feed.FeedParser mapped to contentType
> application/rss+xml via parse-plugins.xml, but its plugin.xml file does not
> claim to support contentType: application/rss+xml
> 2011-12-02 11:12:24,645 ERROR tika.TikaParser - Can't retrieve Tika parser
> for mime-type application/rss+xml
> 2011-12-02 11:12:24,662 INFO  parse.ParseSegment - Parsing:
> http://news.163.com/special/00011K6L/rss_war.xml
> 2011-12-02 11:12:24,664 WARN  parse.ParseSegment - Error parsing:
> http://news.163.com/special/00011K6L/rss_war.xml: failed(2,0): Can't
> retrieve Tika parser for mime-type application/rss+xml
>
>
> I try to modify property'plugin.includes' in 'nutch-defualt.xml' to:
>
> <value>protocol-http|urlfilter-regex|parse-(html|tika)|feed|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> (add feed support)
>
> and Run as JavaApplication in Eclipse,but nothing change.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ERROR-tika-TikaParser-Can-t-retrieve-Tika-parser-for-mime-type-application-rss-xml-tp3553672p3553672.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message