lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Libby <>
Subject [ Re: Re: Proposal for Lucene]
Date Thu, 07 Feb 2002 15:31:11 GMT

    My name is Andrew Libby, and at Andrew Coliver's suggestion
I'm forwarding this.  Initially I hesitated 
to post to the dev list because I'm not (at least yet) developing.  
I've been working with Lucene for a few days now and am having a blast, 

Thanks for the time.


----- Forwarded message from acoliver <> -----

Date: Thu, 7 Feb 2002 06:07:55 -0800 (PST)
From: acoliver <>
Subject: Re: Re: Proposal for Lucene
X-Mailer: E-mailanywhere V2.0 (Windows)

>On Thu, 7 Feb 2002 08:57:30 -0500 Andrew Libby  <> wrote.
>o Indexer vs Crawlers.  I'm thinking that calling them AbstraceCrawler, etc
>  will provide less ambiguity.  Of course, it seems like you're 
>  consolidating the crawling and indexing behind an abstraction so there
>  may be good reason.  I think it'd be nice to avoid ambiguity with
>  other index related classes.  I suppose there are also namespace 
>  solutions to this also.

Agreed.  I can't remember why I decided to call that Indexer versus Crawler.

>o Filters.  It would be nice if Lucene had filters to deal with
>  many document types.  I have been giving this some thought, and
>  Since Lucene (as it stands today) is so flexible, there is no
>  standard interface to write filters to (so far as I can tell).
>  It seems like you are suggesting a layer on top of Lucene which 
>  might be able to build up to such an interface.  If so, having 
>  a collaborative effort to develop document filters would be 
>  of great value.

What got me interested in this is the Jakarta project I founded, POI (which
will appear on jakarta as soon as Sam get around to copying the website
files), provides a pure Java XLS (Excel) abstraction, and will soon provide
a DOC abstraction.  I need these features to connect these filters to Lucene
efficiently and to have some idea

>I work for a company that is considering using Lucene to index a
>document repository.  We're going to need filters, and I'm making a case
>that we contribute the filters we develop to Lucene.

That sounds like a great idea.  If these filters happen to be based on the
OLE 2 Compound Document Format (Excel, Powerpoint, XLS), you might suggest
they take a look at POI as well.  Old site is,
new site will be up on jakarta soon....  You can find the new site in the
sources at:

module = jakarta-poi

I look forward to our continued collaborative opensource development,


>On Thu, Feb 07, 2002 at 07:35:01AM -0500, Andrew C. Oliver wrote:

----- End forwarded message -----

Andrew Libby
CommNav, Inc

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message