lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject RE: Patches and samples
Date Sat, 19 Jan 2002 21:39:59 GMT
> From: Andrew C. Oliver []
> I'm assuming the contribution process works the same way as other
> Jakarta projects (although I realize this project has not been on
> Jakarta for long).  So, I'm posting [PATCH]es to the dev list.  Please
> let me know if this community has a different process so that I can
> conform to it accordingly.

The contribution process for Lucene is not very mature.  I am the lead
developer, but I have not had the time recently to work on Lucene.  I have
also not been involved with other Apache projects, and hence am not that
familiar with the processes.

I think your patches look good and that they should be integrated--I'm "+1"
in Apache-speak.  However I may not get to applying them right away.

> One thing I was about to work on was a few slightly more universal web
> examples as well as an ant build target for them.  I'm also 
> working on a
> short tutorial and walk-through for the demos.  However, I 
> noticed that
> there have already been submissions for some of these such as a JSP
> version of the demos on the user list.  Is there some reason 
> these were
> not included/added?  I'd not like to duplicate someone else's mistake.

The only reason that these have not been added is that I have not had a
chance to inspect and test them before integrating them.  However I think
improving this sort of stuff for Lucene is of vital importance--improving
the initial experience.

> The reason I ask is, we've got a short development cycle for this
> release of POI and while I'd like to contribute to this effort in part
> to make my efforts (POI-Lucene synergies, some side projects 
> I'm working
> on) more successful, I want to make sure my efforts are well 
> applied and
> fruitful.
> Here are the things I have planned for the moment.  This list will
> become more concise and yet comprehensive as I complete my analysis of
> the sources.  Please give me feedback on any that are unnecessary or 
> 1. create a build target for the command line demos 
> 					- already submitted as patch
> 2. create a "getting started" document in xdocs for how one builds and
> installs Lucene and the demos.

That would be great.

> 3. create a template web app and ant target including war file
> deployable in Tomcat.

That would be marvelous.

> 4. investigate the issue of file handles as mentioned earlier.

The file handle issue is basically this: all active index files that are not
entirely read into memory must be kept open in case another thread or
process removes them while updating the index.  A single handle is kept for
each file per IndexReader.  The number of files is proportional to the
number of segments, which, worst-case, is b*log-base-b(doc-count), where b
is IndexWriter.mergeFactor, 10 by default, and doc-count is the number of
documents in the index.  There are a few files per segment which must be
kept open.  So with a million documents, the maximum number of files is
around 200.  If you increase IndexWriter.mergeFactor this will drop.  An
optimized index contains only a single segment and thus requires only a few
files open.

> 5. create a set of interfaces a/o classes for attaching other document
> filters.  

Sounds cool.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message