lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Re: index word files ( doc )
Date Sun, 25 Mar 2007 22:59:58 GMT
I've been using Ryan's textmining in prefence to the POI as internally TM uses 
POI and the Word6 extractor so handles a greater variety of files.

Ryan, thanks for fixing your site.  Do you have any plans/ideas on how to parse 
the 'fast-saved' files and any ideas on Word files older than the Word 6 format?


Ryan Ackley wrote:
> As the author of both Word POI and, I recommend using
> POI is for general purpose manipulation of Word
> documents. textmining's only purpose is extracting text.
> Also, people recommend using POI for text extraction but the only
> place I've seen an actual how-to on this is in the "Lucene in Action"
> book.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message