lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Lucene RAM buffer size limit
Date Tue, 27 Apr 2010 05:28:08 GMT
Hi Tom

I don't know of an easy way to understand the relationship between the max
RAM and the buffer size. I ran the test w/ 8GB heap and 2048 MB RAM buffer.
indexing 16M documents (roughly 288GB data) took 7400 seconds (by 8
threads). I will post the full benchmark output when I finish indexing 25M
documents w/ different RAM buffer sizes.

My gut feeling (and after reading this
http://www.ibm.com/developerworks/java/library/j-jtp09275.html) tells me
that if I need N MB of RAM, I should allocate at least 2*N space on the
heap. But that just takes the RAM buffer into consideration. Since there is
other memory that is allocated, GC might wake up, so in order to avoid that
(as much as possible), I allocate at least 3*N, if N is large enough.

In the current example, I need 2GB for RAM buffer, so I'll allocate at least
4 for on the heap. Then if I assume that the rest of the app won't allocate
a total of more than 2GB, I'll set the heap size to 6GB. Since I have lots
of RAM and cannot use it w/ Lucene, I set the heap size to 8GB. I haven't
though turned on any flags to determine if and when GC ran, so I don't know
if I've hit any nasty GC issues. But, given the total indexing throughput
~(140GB / hour), I think these are good settings.

BTW, I think that w/ parallel arrays (
https://issues.apache.org/jira/browse/LUCENE-2329), the performance should
be better if you use a lower heap size. You can also read there that Michael
B. ran the test w/ 200 RAM buffer and 2GB heap (and also 256MB heap), which
might give you another indication of the RAM buffer / heap size ratio.

Hope this helps,
Shai

On Mon, Apr 26, 2010 at 8:26 PM, Tom Burton-West <tburtonwest@gmail.com>wrote:

>
> I'm looking forward to your results Shai.
>
>
> Once we get our new test server we will be running tests with different RAM
> buffer sizes.  We have 10 300GB indexes to re-index, so we need to minimize
> any merging/disk I/O.
>
> See also this related thread on the Solr list:
>
> http://lucene.472066.n3.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB-tc505964.html#a505964
>
> Is there any easy way to understand the relationship between the max RAM
> buffer size and the total amount of memory you need to give the JVM ?
>
>
> Tom Burton-West
> www.hathitrust.org
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-RAM-buffer-size-limit-tp756752p757354.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message