lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Text extraction tool for Microsoft Office 2007
Date Sun, 22 Feb 2009 07:59:26 GMT

Hi,

POI - http://poi.apache.org/
or
Tika (it uses POI) - http://lucene.apache.org/tika

And you can use code from Lucene in Action to index the text with Lucene - http://manning.com/hatcher2
.  The code is free to download.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: "Zhang, Lisheng" <Lisheng.Zhang@BroadVision.com>
> To: java-user@lucene.apache.org
> Sent: Sunday, February 22, 2009 2:27:06 PM
> Subject: Text extraction tool for Microsoft Office 2007
> 
> Hi,
> 
> What is the best tool (free software) to extract text from 
> Microsoft Office 2007:
> 
> Word 2007, Excel 2007, Power Point 2007
> 
> so that we can index them by lucene?
> 
> Thanks very much for helps, Lisheng
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message