lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel <nigelspl...@gmail.com>
Subject Re: Optimizing unordered queries
Date Wed, 08 Jul 2009 19:57:19 GMT
I created a benchmark test using real queries from our logs.  I kept the LRU
cache the same for now and varied the index divisor:

index divisor = 1: 768 sec.
index divisor = 4: 788 sec. (+ 3%)
index divisor = 8: 855 sec. (+ 11%)
index divisor = 16: 997 sec. (+ 30%)

This is exciting news for me, as it means we can cut our memory usage to
about 1/4th of what it is now, with negligible performance penalty.  And I'm
hoping that in real use the performance actually improves, as more RAM is
available for OS caching, or if I can reclaim some of the saved RAM for a
larger LRU cache, filter caching, etc.

I'll report further results when I get them.

Thanks,
Chris

On Tue, Jul 7, 2009 at 5:43 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> OK good to hear you have a sane number of TermInfos now...
>
> I think many apps don't have nearly as many unique terms as you do;
> your approach (increase index divisor & LRU cache) sounds reasonable.
> It'll make warming more important.  Please report back how it goes!
>
> > My next thought, which I'll try as soon as I can set up some reproducible
> > benchmarks, is using a larger index divisor, perhaps combined with a
> larger
> > LRU TermInfo cache.  But this seems like such an easy win that I wonder
> why
> > it isn't mentioned more often (at least, I haven't seen much discussion
> of
> > it in the java-user archives).  For example, if I simply increase the
> index
> > divisor from 1 to 4, I can cut my Lucene usage from 2gb to 500mb (meaning
> > less GC and more OS cache).  That means much more seeking to find
> non-cached
> > terms, but increasing the LRU cache to 100,000 (for example) would allow
> all
> > (I think) of our searched terms to be cached, at a fraction of the RAM
> cost
> > of the 8 million terms cached now.  (The first-time use of any term would
> of
> > course be slower, but most search terms are used repeatedly, and it seems
> > like a small price to pay for such a RAM win.)  Anyway, I'm curious if
> there
> > are any obvious flaws in this plan.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message