lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Bazhenov <>
Subject Re: WeakIdentityMap high memory usage
Date Thu, 08 Aug 2013 01:34:09 GMT
Uwe, Michael, thank you very much for your help. We have deployed one of the nodes in our system
and tomorrow I'll have more information on that, but it seems that setUseUnmap(false) trick
did the job. RT drops significantly comparing to 3.6.0 version. We have about 100 rps per
search-node and commit interval about 1 minute, so switching off the unmap seems like a good

There is one more question. As far as I understand, this map is like a fuse in situation where
clients continue to use IndexReader after it is already closed. So if the code is correctly
closing IndexReaders (only after all clients have finished using it), there is no need to
use this sort of weak map hashing. Did I get it right?

On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <> wrote:

> Hi Denis,
> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking of buffers
using weak references is also done (although you cannot switch it off, unfortunately).
> I can confirm what Mike says: Its all weak references and the overhead is maybe large,
but it gets freed when memory gets low. In general its in most cases better to not allocate
too much heap space for Lucene as this makes those maps larger and GC gets stressed. Only
use as much memory so no OOM occurs and instead free al memory for the file system cache (so
it has less paging). In that case, GC will clean up the concurrent maps faster.
> In gernal: If you have an large index that changes seldom, but your query rate is very
hight (like 200 queries per second), switch unmapping off (works since Lucene 4.2, see changelog
for LUCENE-4740 - unfortunately the issue itself was closed for 4.4, 4.2 would be correct).
In that case it's not needed to take care of unmapping and as index reopen rate is low, this
does not waste resources.
> But if your index changes often, there is no way around unmapping - or use NIOFSDir with
NRTCachingDirectory for the optimization of near real time search with highly changing indexes!
> Finally: The only way to fix this would be to make all codec structures like TermsEnum
or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. When you are done with Scorer
you have to close it and the underlying cloned indexinput would be closed, too. In that case,
the cloned IndexInput would be refcounted and unmapped when the last clone is closed. This
is a larger change and might be an idea for Lucene 5.0 as "optimization". It would be a backwards
break because all codecs and all queries would need to close correctly, but with our test
frameworak and MockDirWrapper (and other MockFooBarWrappers) we could track this so all resources
are closed.
> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 because it was never
working in 3.x (nobody ever called close() on TermEnum or TermDocs instances.... :( ). With
our new test framework this could be tracked now... So maybe worth a try?
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> eMail:
>> -----Original Message-----
>> From: Michael McCandless []
>> Sent: Wednesday, August 07, 2013 3:45 PM
>> To: Lucene Users
>> Subject: Re: WeakIdentityMap high memory usage
>> This map is used to track all cloned open files, which can be a very large
>> number over time (each search will create maybe 3 of them).
>> This is done as a "best effort" to prevent SEGV (JVM dies) if you accidentally
>> try to use an IndexReader after it was closed, while using MMapDirectory.
>> However, it's a weak map, which means when HEAP is tight GC should drop
>> it.
>> So, this should not cause a real problem in "real life", even though it looks
>> scary when you look at its RAM usage under a profiler.
>> If somehow it's causing "real life" problems, please report back!  But a simple
>> workaround is to call MMapDirectory.setUseUnmap(false) to turn off this
>> tracking; this means you rely on GC to (eventually) unmap.
>> Mike McCandless
>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov <>
>> wrote:
>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced high
>> minor GC time. Heap dump showed that one of the biggest objects by size is
>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11
>> million instances with about 377 megabytes of memory in total (this is not
>> even retained size). Here is screenshot of the JProfiler output:
>> 08-07%20at%205.35.22%20PM.png.
>>> The keys of the map are MMapIndexInput. What this map is for and how
>> can I reduce it memory usage?
>>> ---
>>> Denis Bazhenov <>
>>> FarPost.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Denis Bazhenov <>

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message