lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4740) Weak references cause extreme GC churn
Date Thu, 31 Jan 2013 18:47:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567924#comment-13567924
] 

Uwe Schindler commented on LUCENE-4740:
---------------------------------------

bq. Yes, that change is probably not a good general solution, but it worked well for our usecase.
It might be nice to have support for unloadable classes optional.

As I said, a change in AttributeSource or VirtualMethod is not needed, the number of total
per-JVM references there are in the number of 10s. This is perfectly fine code and nobody
needs to change anything. No need for "optional" class unloading. *Not* using weak references
here would be a major design issue and a large leak.

bq. In any case, if the useUnmap is false, then it seems unnecessary to even add references
to the clones to the map.

Robert and me were discussing about that already, we can do that, this patch is easy. We can
offer that as an option (the no-unmap option), with the backside of e.g. windows can no longer
delete index files unless they are garbage collected and especially higher disk usage while
indexing.

I did some testing with various JDKs on windows 64 bit, using a loop that clones one indexinput
over and over. This loop runs successful for hours without OOM, so there is no cleanup problem,
ReferenceQueues are working correctly. With a heap size of 512 MB and this simple loop, the
number of Weak references is between 5000 and 600,000. But indeed, there are some GC pauses
(in JDK 6 and 7). The reason for this is: Weak referees are a little bit more "reachable"
than unreachable objects, so GC let them survive for a longer time than unreachable ones.
There is nothing we can do against that. The main problem in your case maybe the really large
heap size: why do you need it?

My second test was to close every cloned index input (trunk/4.x only, where the commit you
mentioned was added by me one week ago), in that case the number of references was of course
a static "1" :-) In this test, no GC pauses occurred and the test ran faster.

In my final test I disabled the put() to the WeakIdentityMap completely, in that case it was
again faster, but this was caused more by the complete non-existence of any locking or maintenance
of the ConcurrentHashMap.

The times for 300 million clones:
- With default Lucene 4.x/trunk, no close of clones _(Lucene never closes clones and thats
almost impossible to add)_: 200 secs, GC pauses
- With closing clones: 65 secs
- Without any map: 40 secs

(JDK 6u32, windows, 64 bit, server vm, default garbage collector)

{code:java}
  // for this test, make the clones map in ByteBufferIndexInput public/package-private/...
  public void testGC() throws Exception {
    MMapDirectory mmapDir = new MMapDirectory(_TestUtil.getTempDir("testGC"));
    IndexOutput io = mmapDir.createOutput("bytes", newIOContext(random()));
    io.writeVInt(5);
    io.close();
    IndexInput ii = mmapDir.openInput("bytes", IOContext.DEFAULT);
    int hash = 0;
    for (int i = 0; i < 300*1024*1024; i++) {
      final IndexInput clone = ii.clone();
      hash += System.identityHashCode(clone);
      if (i % (10*1024) == 0) {
        System.out.println("Number of clones: " + ((ByteBufferIndexInput) ii).clones.size());
      }
      //clone.close();
    }
    ii.close();
    mmapDir.close();
  }
{code}

In any case, we can allow user to disable unmap, but we then have to keep the weak references
to the clones when unmapping is enabled, unless we add close() of clones to Lucene everywhere...

Some other ideas are: Reuse the ByteBufferIndexInput instances, so we dont need to recreate
them all the time. I have no idea how to do that, because we have no close() to release those,
which brings us back to that problem again.
                
> Weak references cause extreme GC churn
> --------------------------------------
>
>                 Key: LUCENE-4740
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4740
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>    Affects Versions: 3.6.1
>         Environment: Linux debian squeeze 64 bit, Oracle JDK 6, 32 GB RAM, 16 cores
>            Reporter: Kristofer Karlsson
>            Priority: Critical
>
> We are running a set of independent search machines, running our custom software using
lucene as a search library. We recently upgraded from lucene 3.0.3 to 3.6.1 and noticed a
severe degradation of performance.
> After doing some heap dump digging, it turns out the process is stalling because it's
spending so much time in GC. We noticed about 212 million WeakReference, originating from
WeakIdentityMap, originating from MMapIndexInput.
> Our problem completely went away after removing the clones weakhashmap from MMapIndexInput,
and as a side-effect, disabling support for explictly unmapping the mmapped data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message