incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Index-time RAM consumption settings (was Invalid UTF-8)
Date Mon, 01 Feb 2010 19:52:45 GMT
On Wed, Jan 27, 2010 at 10:43:22PM -0600, Peter Karman wrote:

> Is there, or any plan to, make the DEFAULT_MEM_THRESH alterable at runtime? 

I've made it settable privately so that we could go back to simulating large
indexes within the test suite. But as a public API?  

Well, here's the problem.  It's an implementation detail, specific to
PostingListWriter.  I'm just about to add another, separate SortExternal pool
in SortWriter, which will have its own threshold at which it flushes runs to
disk.  More generally, arbitrary index components added using custom
Architectures might have their own pools and their own thresholds.  How would
setting a default memory threshold for one affect the others?

I don't think it makes sense to expose any of those thresholds specifically.
Lucene has historically exposed all kinds of extra optimization settings via
IndexWriter, which go stale as the underlying implementation changes, bloating
IndexWriter's API and causing confusion:

And so on.  I think that's sub-optimal design for a number of reasons, and I
think it's important that Lucy *not* go down the same road.

> I'm assuming that in situations where available ram is low, it would be
> helpful to trade-off speed for memory by setting the threshold lower and
> flushing to disk more often. Is that a realistic assumption?

If we were to do something like that, it would be one dial, and instead of
Indexer it would go into IndexManager, where we hide all expert per-session
settings.  Rather than an absolute number, it would be a float multiplier
defaulting to 1.0 which all index components would have the option of
consulting.  PostingListWriter would use it to scale its memory threshold.

However, it would not cap memory usage.  It wouldn't be like specifying a JVM
heap size.  And performance will still depend to a large extent on the size of
the index and the RAM installed in the machine, since speed will dive if our
temp files get ejected from the IO cache.

FWIW, once we fix SortWriter's RAM consumption problem, we'll go back to being
relatively parsimonious with process RAM.

Marvin Humphrey

View raw message