lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christiaan Fluit <christiaan.fl...@aduna-software.com>
Subject Re: which way to index pdf,word,excel
Date Wed, 06 Sep 2006 08:01:43 GMT
Have a look at Aperture: http://aperture.sourceforge.net/
It provides components for crawling and text and metadata extraction. 
It's still in alpha stage though. The development code in CVS has 
already improved a lot over the last official alpha release.

Chris
--

James liu wrote:
> i wanna find frame which can index xml,word,excel,pdf,,,not one.
> 
> 
> 2006/9/6, Doron Cohen <DORONC@il.ibm.com>:
>>
>> Lucene FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ - has a few
>> entries just for this:
>>
>>   How can I index HTML documents?
>>   How can I index XML documents?
>>   How can I index OpenOffice.org files?
>>   How can I index MS-Word documents?
>>   How can I index MS-Excel documents?
>>   How can I index MS-Powerpoint documents?
>>   How can I index Email (from MS-Exchange or another IMAP server) ?
>>   How can I index RTF documents?
>>   How can I index PDF documents?
>>   How can I index JSP files?
>>
>>
>> "James liu" <liuping.james@gmail.com> wrote on 05/09/2006 19:14:24:
>>
>> > i find lius many question ,,,,so i wanna give up and find new.
>> >
>> > who recommend ?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 


Met vriendelijke groet,

Christiaan Fluit
-- 
Aduna - Guided Exploration
www.aduna-software.com

Prinses Julianaplein 14-b
3817 CS Amersfoort
The Netherlands
+31-33-4659987 (office)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message