lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lucene memory usage
Date Thu, 11 Jun 2009 00:19:09 GMT
On Wed, Jun 10, 2009 at 7:23 PM, Jason
Rutherglen<> wrote:
> Cool! Sounds like with LUCENE-1458 we can experiment with some
> of these things. Does CSF become just another codec?

I believe LUCENE-1458 currently only makes terms dict & postings

>> I'm leary of having terms dict live entirely on disk, though
> we should certainly explore it.
> Yeah, it should theoretically help with reloading, it could use
> a skiplist (as we have a disk version of that implemented)
> instead of binarysearch). It seems like with things like
> TrieRange (which potentially adds many fields and terms) it
> could be useful to let the IO cache calculate what we need in
> RAM and what we don't, otherwise we're constantly at risk of
> exceeding heap usage. There'll be other potential RAM issues
> (such as page faults), but it seems like users will constantly
> be up against the inability to precalculate Java heap usage of
> data structures (whereas file based data usage can be measured).
> Norms are another example, and with flexible indexing (and
> scoring?) there may be additional fields the user may want to
> change dynamically, that if completely loaded into heap cause
> OOM problems.
> I guess I personally think it would be great to not worry about
> exceeding heap with Lucene apps (as it's a guessing game), and
> then one can simply analyze the OS level IO cache/swap space to
> see if the app could slow down due to the machine not having
> enough RAM. I think this would remove one of the major
> differences between a Java based search engine and a C++ based
> one.

Marvin and I discussed this quite a bit already in LUCENE-1458... we
should make it pluggable and then try both -- let the machine tell
us ;)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message