lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <p...@uptima.co.uk>
Subject Re: Index entire filesystem
Date Wed, 05 Nov 2003 10:50:32 GMT
Hi Stefan

Using OpenOffice will enable you to parse 182 file formats, but its not a
pure java solution and you still need an alternate solution for pdfs.

I'd be interested in knowing whether anyone is working on a pure java
solution that would give us a single method for handling ms office
documments / pdfs / etc.

Cheers

Pete

----- Original Message ----- 
From: "Stefan Groschupf" <sg@media-style.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, November 05, 2003 10:26 AM
Subject: Re: Index entire filesystem


>
> I had write to this list some days ago, to announce a possibility to
> parse 182 file formats.
> There was a tiny bug report some days ago, i hope i can fix it.
>
> Browse the archive to figure out more.
>
> Cheers
> Stefan
>
> Marcel Stor wrote:
>
> >Hi all,
> >
> >I'm thinkin' about writing a search tool for my filesystem. I know such
> >things exist already but programming it myself is much more fun ;-)
> >So, I would have Lucene crawl through my filesystem and pass each file
> >to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
> >system and would depend on the file ending to distinguish the file type.
> >Is this a good idea in general? Is there a list of available indexer for
> >the the different file types? Any other comments are also welcome.
> >
> >Regards,
> >Marcel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message