lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Optimizing unordered queries
Date Mon, 06 Jul 2009 16:37:07 GMT
On Mon, Jun 29, 2009 at 9:33 AM, Nigel<> wrote:

> Ah, I was confused by the index divisor being 1 by default: I thought it
> meant that all terms were being loaded.  I see now in SegmentTermEnum that
> the every-128th behavior is implemented at a lower level.
> But I'm even more confused about why we have so many terms in memory.  A
> heap dump shows over 270 million TermInfos, so if that's only 128th of the
> total then we REALLY have a lot of terms.  (-:  We do have a lot of docs
> (about 250 million), and we do have a couple unique per-document values, but
> even so I can't see how we could get to 270 million x 128 terms.  (The heap
> dump numbers are stable across the index close-and-reopen cycle, so I don't
> think we're leaking.)

You could use CheckIndex to see how many terms are in your index.

If you do the heap dump after opening a fresh reader and not running
any searches yet, you see 270 million TermInfos?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message