lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Re: Possible IndexInput optimization
Date Sun, 29 Mar 2009 19:46:19 GMT
>> In my case I have to switch to MMap/Buffers, Java behaves ugly with
>> 8Gb heaps.
> Do you mean that because garbage collection does not perform well
> on these larger heaps, one should avoid to create arrays to have heaps
> of that size, and rather use (direct) MMap/Buffers?
Yes, exactly. Keeping big Directories in heap is painful in many ways:
1. Old-gen GC is slow on big heaps. Our 3Gb heaps were collected for
6-8 seconds with parallel collector on four-way machines. Concurrent
collector consistently core dumps, whatever the settings :) Then we
tried increasing heaps (upto 8Gb) in pursuit of less machines in
cluster, and it just collected for eternity.
2. Eden-survivor-old chain is showering sparks around when you feed it
with huge arrays created in numbers. So your New-gen GCs are still
swift (100-200ms), but happen too often. As a consequence some of
short-lived objects start leaking into Old-gen.
3. You have to reserve place for merges. Fully optimizing index is
very taxing, I cheat by stopping accepting outside requests, switching
off memory cache, optimizing, then putting everything back in place.

I'm currently testing mmap approach, and despite Sun's braindead API,
it works like a charm.

While I'm at it, I got two more questions about MMapDirectory.
How often openInput() is called for a file? Is it worthy to do
getChannel().map() when file is written and closed, and then clone the
buffer for each openInput()?
Why don't you force() a newly-mapped Buffer? It will save first few
searches hitting a new segment from pagefaults and waiting for that
segment to be loaded.

Kirill Zakharenko/Кирилл Захаренко (
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message