lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: Post mortem kudos for (LUCENE-843) :)
Date Tue, 17 Jul 2007 22:52:23 GMT

"Peter Keegan" <> wrote:
> I did some performance comparison testing of Lucene 2.0 vs. trunk (with
> LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new
> DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better
> yet, the total time to build the index is much shorter because I can now
> build the entire 3GB index (900K docs) in one segment in RAM (using
> FSDirectory) and flush it to disk at the end. Before, I had to build smaller
> segments (20K docs), merge after 20 segments and then optimize at the end.

Awesome :)

> The memory usage with LUCENE-843 is much lower, presumably because stored
> fields and term vectors no longer sit in RAM.

Right, not buffering the stored fields & term vectors in RAM is a big
win.  In addition, the storage of the postings in RAM as a single shared
hash table using a pool of large byte[] arrays vs separate 1 KB
buffers for the files for a single segment document, also improve RAM

In my tests, using Europarl content with small docs (~100 terms = ~550
bytes per doc) with stored fields & term vectors enabled the RAM
efficiency is 44X better than before.

> I also observed a 20-25% gain by reusing the Field objects. Implementing my
> own Fieldable class was too complicated, so I simply extended the Field
> class (after removing final) and added 2 setter methods:
>       public void setValue(String value) {
>         this.fieldsData = value;
>       }
>       public void setValue(byte[] value) {
>         this.fieldsData = value;
>       }
> Since this improved performance significantly, I would vote to either add
> setters to Field or make it extendable.

OK I've opened LUCENE-963 for this & attached a patch.

> Kudos to Mike for this huge improvement!



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message