lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: Index entire filesystem
Date Wed, 05 Nov 2003 11:01:01 GMT
There is some ongoing work for nutch.org.
May be we can bundle all work together?! <open source>
Nutch has alraeady a java *.doc, *.pdf parser as well .

Stefan

Pete Lewis wrote:

>Hi Stefan
>
>Using OpenOffice will enable you to parse 182 file formats, but its not a
>pure java solution and you still need an alternate solution for pdfs.
>
>I'd be interested in knowing whether anyone is working on a pure java
>solution that would give us a single method for handling ms office
>documments / pdfs / etc.
>
>Cheers
>
>Pete
>
>----- Original Message ----- 
>From: "Stefan Groschupf" <sg@media-style.com>
>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>Sent: Wednesday, November 05, 2003 10:26 AM
>Subject: Re: Index entire filesystem
>
>
>  
>
>>I had write to this list some days ago, to announce a possibility to
>>parse 182 file formats.
>>There was a tiny bug report some days ago, i hope i can fix it.
>>
>>Browse the archive to figure out more.
>>
>>Cheers
>>Stefan
>>
>>Marcel Stor wrote:
>>
>>    
>>
>>>Hi all,
>>>
>>>I'm thinkin' about writing a search tool for my filesystem. I know such
>>>things exist already but programming it myself is much more fun ;-)
>>>So, I would have Lucene crawl through my filesystem and pass each file
>>>to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
>>>system and would depend on the file ending to distinguish the file type.
>>>Is this a good idea in general? Is there a list of available indexer for
>>>the the different file types? Any other comments are also welcome.
>>>
>>>Regards,
>>>Marcel
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>>      
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>  
>




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message