From Doug Cutting <>
Subject Re: Development plans for Lucene?
Date Mon, 04 Nov 2002 17:55:27 GMT
Brian Goetz wrote:
> I think the "Doug is busy" argument basically boils down to "Doug
> thinks this project is mostly done."  And I would tend to agree.
> Could it be improved?  For sure.  Is it missing major pieces?  I don't
> think so.

Thanks, Brian!  You put this well.

I place requests for changes to Lucene into three categories:

1. Requests for additions outside the core.  Things in this category 
include: support for more languages; query parsers; database storage; 
crawlers, etc.  Whether these belong in the base distribution is a 
matter of debate (sometimes hot).  My rule of thumb for including them 
is their generality: if they are likely to be useful to a large 
proportion of Lucene users then they should probably go in the base 
distribution.  Language support in particular is tricky.  Perhaps we 
should migrate to a model where the base distribution includes no 
analyzers, and supply separate language packages.

2. Requests for features that require a different core.  Lucene's 
architecture makes some things hard to do.  For example, one cannot 
easily update a single field of a document: one must instead delete and 
re-index the entire document.  Unless I've missed a clever trick, to 
change this would require a substantial re-write of Lucene's index 
internals, and it's probably not worth it.  When I have time, I try to 
respond to these requests explaining why they are hard.

3. Requests for reasonable changes to the core.  Examples include: file 
locking to make things multi-process safe; adding an API for boosting 
individual documents and fields values; making the scoring API 
extensible and public; etc.  Such changes are not always made promptly. 
  I am indeed busy (both with work and with two babies at home) and I am 
not paid to work on Lucene directly.  However, when I have time, I do 
make such changes.  All but the last of the above examples have been 
made recently, and I intend to address the last soon: I plan to add 
support for an extensible scoring API in the next few months.

Most requests are of type (1), fewer are of type (2), and fewer yet of 
type (3).  I think this validates Brian's point that the core is stable, 
fairly complete, and self-consistent.

For Lucene to be used more broadly, more work is required developing 
solutions for type (1) requests.  We need a better out-of-box 
experience: for common applications folks should be able to unpack the 
distribution, perhaps use a simple configuration interface, and then 
start indexing and searching.  Today things are not always that simple, 
but I don't think this is the fault of the core.


