hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
Date Tue, 13 Jun 2017 21:11:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Misha Dmitriev updated HADOOP-14523:
------------------------------------
    Attachment: HADOOP-14523.02.patch

Fixed checkstyle. The test that failed on the previous patch looks unrelated.

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-14523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14523
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: HADOOP-14523.01.patch, HADOOP-14523.02.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two excerpts from
the analysis done with jxray (www.jxray.com) are given below. It turns out that nearly a half
of live memory is taken by objects awaiting finalization, and the biggest offender among them
is class OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>      <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager,
java.util.jar.JarFile etc.)
>      <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next
<-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>      <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- j.l.r.Finalizer.referent
<-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next}
<-- Java Static: sun.misc.Cleaner.first
> ---------------------
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>      <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, java.util.zip.ZipFile$ZipFileInflaterInputStream,
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>      <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static:
java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run full GC
before dumping the heap. So we are already looking at the heap right after GC, and yet all
these unfinalized objects are there. I think this happens because the JVM always runs only
one finalization thread, and thus the queue of objects that need finalization may get processed
too slowly. My understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, it occupies
memory, and _all the objects that it references stay in memory as well_.
> 3. The finalization thread processes objects from the finalization queue serially, thus
x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory for long
time, it's now in Old Gen of the heap, so only full GC can clean it up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible that a lot
of unreachable, but unfinalized objects flood the memory. I guess we are seeing all these
OpensslAesCtrCryptoCodec objects when they are in phase 3 above. And the really bad thing
is that these objects in turn keep in memory a whole lot of other stuff, in particular JobConf
objects. Such a JobConf has nothing to do with finalization, yet the GC cannot release it
until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
>     Closeable r = (Closeable) this.random;
>     r.close();  // Relevant only when (random instanceof OsSecureRandom == true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of objects, is relevant
only when this codec is configured to use OsSecureRandom class. The latter reads random bytes
from the configured file, and needs finalization to close the input stream associated with
that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and add it to
the only class from this "family" that really needs it, OsSecureRandom. That will ensure that
only OsSecureRandom objects (if/when they are used) stay in memory awaiting finalization,
and no other, irrelevant objects.
> Note that this solution means that streams are still closed lazily. This, in principle,
may cause its own problems. So the most reliable fix would be to call OsSecureRandom.close()
explicitly when it's not needed anymore. But the above fix is a necessary first step anyway,
it will remove the most acute problem with memory and will not make any other things worse
than they currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message