lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <>
Subject Re: Post mortem kudos for (LUCENE-843) :)
Date Tue, 17 Jul 2007 19:17:51 GMT
I did some performance comparison testing of Lucene 2.0 vs. trunk (with
LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new
DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better
yet, the total time to build the index is much shorter because I can now
build the entire 3GB index (900K docs) in one segment in RAM (using
FSDirectory) and flush it to disk at the end. Before, I had to build smaller
segments (20K docs), merge after 20 segments and then optimize at the end.
The memory usage with LUCENE-843 is much lower, presumably because stored
fields and term vectors no longer sit in RAM.

I also observed a 20-25% gain by reusing the Field objects. Implementing my
own Fieldable class was too complicated, so I simply extended the Field
class (after removing final) and added 2 setter methods:

      public void setValue(String value) {
        this.fieldsData = value;
      public void setValue(byte[] value) {
        this.fieldsData = value;

Since this improved performance significantly, I would vote to either add
setters to Field or make it extendable.

Kudos to Mike for this huge improvement!


On 7/13/07, Michael McCandless <> wrote:
> "Grant Ingersoll" <> wrote:
> > This is good stuff...  Might be good to put a organized version of
> > this up on the Wiki under Best Practices
> I agree!  I will update the ImproveIndexingSpeed page:
> with these suggestions.
> > On Jul 13, 2007, at 8:13 AM, Michael McCandless wrote:
> >
> > > Yeah it's not so easy now: does not have setters.
> > >
> > > You have to make your own class that implements Fieldable (or
> > > subclasses AbstractField) and adds your own setters. is
> > > also [currently] final so you can't subclass it.
> > >
> >
> > Should we consider putting in these changes?  I think it might be a
> > little weird on the Search side to have setters for Field and it
> > sounds like it could cause trouble for people esp. in a threaded
> > indexing situation, but maybe I am mistaken?
> I think adding setters would be reasonable, if we document clearly
> that they are advanced, be careful about threads, use at your own risk
> sort of methods?  Are there any concerns with that approach?  If not
> I'll open an issue and do it... this just makes it easier for people
> to maximize indexing performance "out of the box".
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message