lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: AW: Parsing MSWord
Date Wed, 12 Nov 2008 15:09:10 GMT
Or Tika, Lucene's cousin: http://incubator.apache.org/tika/
(which uses POI under the hood, but goes beyond MS Word parsing)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Donna L Gresh <gresh@us.ibm.com>
To: java-user@lucene.apache.org
Sent: Wednesday, November 12, 2008 8:25:43 AM
Subject: Re: AW: Parsing MSWord

Check out POI; that's what I use

http://poi.apache.org/


"Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch> wrote on 11/12/2008 03:25:47 
AM:

> Hi
> 
> You can also use a tool called "antiword" to extract the text from a
> .doc file, and then
> give the text to lucene.
> 
> See here : http://en.wikipedia.org/wiki/Antiword
> 
> Regards
> Mirko
> 
> -----Urspr√ľngliche Nachricht-----
> Von: dipesh [mailto:dipshrestha@gmail.com] 
> Gesendet: Mittwoch, 12. November 2008 04:38
> An: java-user@lucene.apache.org
> Betreff: Parsing MSWord
> 
> Hello,
> I wanted to know if there are classes in Lucene that support parsing 
MSWord
> documents.
> Many thanks,
> Dipesh
> 
> ----------------------------------------
> "Help Ever Hurt Never"- Baba

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message