lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Text extraction tool for Microsoft Office 2007
Date Sun, 22 Feb 2009 10:21:04 GMT

Also, that chapter (7) has been rewritten in the revised Lucene in  
Action (available through Manning's early access now); it's now based  
entirely on Tika.

But, note that Tika only just recently is able to extract text from  
Office 2007 (I think):

     https://issues.apache.org/jira/browse/TIKA-152

you'll have to build off of trunk or use the SNAPSHOT from Maven.

Mike

Otis Gospodnetic wrote:

>
> Hi,
>
> POI - http://poi.apache.org/
> or
> Tika (it uses POI) - http://lucene.apache.org/tika
>
> And you can use code from Lucene in Action to index the text with  
> Lucene - http://manning.com/hatcher2 .  The code is free to download.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: "Zhang, Lisheng" <Lisheng.Zhang@BroadVision.com>
>> To: java-user@lucene.apache.org
>> Sent: Sunday, February 22, 2009 2:27:06 PM
>> Subject: Text extraction tool for Microsoft Office 2007
>>
>> Hi,
>>
>> What is the best tool (free software) to extract text from
>> Microsoft Office 2007:
>>
>> Word 2007, Excel 2007, Power Point 2007
>>
>> so that we can index them by lucene?
>>
>> Thanks very much for helps, Lisheng
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message