nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Re: Parse MS Office etc. in Nutch 1.2
Date Fri, 08 Oct 2010 12:40:19 GMT
>
> Sorry, I'm able to parse doc, docx, sxw, odt and rtf as well. After I
> removed the plugins.folder I changed in order to run Nutch inside Eclipse,
> everything works.
>

Good


>
> BTW, I see the following in my log file:
> 2010-10-08 13:56:32,555 WARN  more.MoreIndexingFilter -
> http://ridder.uio.no/test1.xlsx: can't parse erroneous date:
> 2010-10-08T13:55:54Z
> 2010-10-08 13:56:32,558 WARN  more.MoreIndexingFilter -
> http://ridder.uio.no/wtest1.docx: can't parse erroneous date:
> 2010-10-08T13:55:49Z
>
> Should I report this as an IndexingFilter bug? It seems that I need to
> rewrite it in order to parse the date correctly, but not a big issue right
> now.
>
>
yes please

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message