lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: A model for predicting indexing memory costs?
Date Wed, 11 Mar 2009 13:51:43 GMT

OK, it's early days and I'm holding my breath but I'm currently progressing further through
my content without an OOM just by using a different GC setting.

Thanks to advice here and colleagues at work I've gone with a GC setting of -XX:+UseSerialGC
for this indexing task.

The rationale that is emerging is there are potentially 2 different GC strategies that should
be applied for Lucene work - one for indexing and one for search

1) Search GC model
Search typically generates much less state than indexing tasks and each search can often have
an SLA associated with it (e.g. results returned in <0.5 seconds). 
In this environment a non-invasive background GC is useful (e.g. the parallel collector) to
avoid long interrupts on individual searches. ParallelCollector will throw OOM if these background
GC tasks appear to lock-up ( http://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html
). I think this is the problem I was getting originally.
2) Index GC model
Indexing generates large volumes of objects which are kept in RAM while indexing then flushed.
Individual document adds are typically not subject to an SLA but overall indexing time for
a batch is. In this scenario a "stop-the-world" garbage collect mid-indexing is a welcome
cleansing and has no business impact if it takes a while. This is what I beleive I'm now getting
from the -UseSerialGC setting.


This does suggest that it might be a good idea to use a different VM for indexing to the one
used for searching so that you can impose these different GC models.

That's the theory at least and I'm hoping this works out for me in practice......






----- Original Message ----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Wednesday, 11 March, 2009 13:04:56
Subject: Re: A model for predicting indexing memory costs?


Mark Miller wrote:

> Michael McCandless wrote:
>> 
>> Ie, it's still not clear if you are running out of memory vs hitting some weird "it's
too hard for GC to deal" kind of massive heap fragmentation situation or something.  It reminds
me of the special ("I cannot be played on record player X") record (your application) that
cannot be played on a given record player X (your JRE) in Gödel, Escher, Bach ;)
>> 
> Perhaps its been too long since I've seen that book, but if I remember right, he only
has to get himself a JRE Omega version and he should be all set...


That's right ;)  But JRE Omega will still have a [different] record that causes its GC to
grind to a halt... (or maybe I don't remember the book very well!).

Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message