lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Aristov" <alexander.aris...@gmail.com>
Subject Re: Parsing MSWord
Date Wed, 12 Nov 2008 08:40:22 GMT
Antiword would be hard to inject into Nutch as it is not Java based. It will
reqier native calls.

Alexander

2008/11/12 Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch>

> Hi
>
> You can also use a tool called "antiword" to extract the text from a .doc
> file, and then
> give the text to lucene.
>
> See here : http://en.wikipedia.org/wiki/Antiword
>
> Regards
> Mirko
>
> -----Urspr√ľngliche Nachricht-----
> Von: dipesh [mailto:dipshrestha@gmail.com]
> Gesendet: Mittwoch, 12. November 2008 04:38
> An: java-user@lucene.apache.org
> Betreff: Parsing MSWord
>
> Hello,
> I wanted to know if there are classes in Lucene that support parsing MSWord
> documents.
> Many thanks,
> Dipesh
>
> ----------------------------------------
> "Help Ever Hurt Never"- Baba
>



-- 
Best Regards
Alexander Aristov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message