lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: index word files ( doc )
Date Sun, 25 Mar 2007 22:59:58 GMT
I've been using Ryan's textmining in prefence to the POI as internally TM uses 
POI and the Word6 extractor so handles a greater variety of files.

Ryan, thanks for fixing your site.  Do you have any plans/ideas on how to parse 
the 'fast-saved' files and any ideas on Word files older than the Word 6 format?

Regards
Antony


Ryan Ackley wrote:
> As the author of both Word POI and textmining.org, I recommend using
> textmining.org. POI is for general purpose manipulation of Word
> documents. textmining's only purpose is extracting text.
> 
> Also, people recommend using POI for text extraction but the only
> place I've seen an actual how-to on this is in the "Lucene in Action"
> book.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message