lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <p...@uptima.co.uk>
Subject Re: Index entire filesystem
Date Wed, 05 Nov 2003 23:08:45 GMT
Hi Stefan

Wouldn't mind joining in a joint approach, only problem is timing - it would
probably be late December before we could start putting the hours in.

If anyone could come up with work packages, we wouldn't mind doing our share
of the work - otherwise I wouldn't mind leading an effort in the New Year.

Has anyone done a full survey of what's out there?  I'd like to be able to
cover the list that Stellent's OutsideIn filters cover (see attached) but
obviously starting from the most popular formats.

Cheers

Pete

----- Original Message ----- 
From: "Stefan Groschupf" <sg@media-style.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, November 05, 2003 11:01 AM
Subject: Re: Index entire filesystem


> There is some ongoing work for nutch.org.
> May be we can bundle all work together?! <open source>
> Nutch has alraeady a java *.doc, *.pdf parser as well .
>
> Stefan
>
> Pete Lewis wrote:
>
> >Hi Stefan
> >
> >Using OpenOffice will enable you to parse 182 file formats, but its not a
> >pure java solution and you still need an alternate solution for pdfs.
> >
> >I'd be interested in knowing whether anyone is working on a pure java
> >solution that would give us a single method for handling ms office
> >documments / pdfs / etc.
> >
> >Cheers
> >
> >Pete
> >
> >----- Original Message ----- 
> >From: "Stefan Groschupf" <sg@media-style.com>
> >To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >Sent: Wednesday, November 05, 2003 10:26 AM
> >Subject: Re: Index entire filesystem
> >
> >
> >
> >
> >>I had write to this list some days ago, to announce a possibility to
> >>parse 182 file formats.
> >>There was a tiny bug report some days ago, i hope i can fix it.
> >>
> >>Browse the archive to figure out more.
> >>
> >>Cheers
> >>Stefan
> >>
> >>Marcel Stor wrote:
> >>
> >>
> >>
> >>>Hi all,
> >>>
> >>>I'm thinkin' about writing a search tool for my filesystem. I know such
> >>>things exist already but programming it myself is much more fun ;-)
> >>>So, I would have Lucene crawl through my filesystem and pass each file
> >>>to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
> >>>system and would depend on the file ending to distinguish the file
type.
> >>>Is this a good idea in general? Is there a list of available indexer
for
> >>>the the different file types? Any other comments are also welcome.
> >>>
> >>>Regards,
> >>>Marcel
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >>
> >>
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Mime
View raw message