hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
Date Tue, 03 Apr 2018 23:07:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Chen updated HADOOP-15359:
-------------------------------
    Description: 
In a recent internal testing, we have found a DFS client hang. Further inspecting jstack shows
the following:

{noformat}
"IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5 os_prio=0
tid=0x00007f6bb306c000 nid=0x1c76e waiting for monitor entry [0x00007f6bc2bd6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.security.Provider.getService(Provider.java:1035)
        - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
        at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
        at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
        at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
        at javax.crypto.Cipher.getInstance(Cipher.java:513)
        at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
        at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
        at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
        at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
        at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
        at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
        at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
        at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
        at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
        at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
        at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
        at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
        at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
        - locked <0x0000000083444878> (a java.nio.HeapByteBuffer)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked <0x00000000834448c0> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
{noformat}

and at the end of jstack:
{noformat}
Found one Java-level deadlock:
=============================
"IPC Parameter Sending Thread #29":
  waiting to lock monitor 0x0000000017ff49f8 (object 0x0000000080277040, a sun.security.provider.Sun),
  which is held by UNKNOWN_owner_addr=0x0000000050607000

Java stack information for the threads listed above:
===================================================
"IPC Parameter Sending Thread #29":
        at java.security.Provider.getService(Provider.java:1035)
        - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
        at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
        at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
        at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
        at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
        - locked <0x00000000834386b8> (a java.lang.Object)
        at javax.crypto.SecretKeyFactory.<init>(SecretKeyFactory.java:121)
        at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
        at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
        at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
        at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
        at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
        at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
        at sun.security.jgss.krb5.WrapToken.<init>(WrapToken.java:422)
        at sun.security.jgss.krb5.Krb5Context.wrap(Krb5Context.java:922)
        at sun.security.jgss.GSSContextImpl.wrap(GSSContextImpl.java:385)
        at com.sun.security.sasl.gsskerb.GssKrb5Base.wrap(GssKrb5Base.java:103)
        at org.apache.hadoop.security.SaslRpcClient$WrappedOutputStream.write(SaslRpcClient.java:636)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x00000000834389a8> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
        - locked <0x00000000834389c0> (a java.io.DataOutputStream)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

After some research, the closest I found is https://bugs.openjdk.java.net/browse/JDK-7092821
. 
Filing this jira for discussions.

JDK version used was 1.8.0_144.

  was:
In a recent internal testing, we have found a DFS client hang. Further inspecting jstack shows
the following:

{noformat}
"IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5 os_prio=0
tid=0x00007f6bb306c000 nid=0x1c76e waiting for monitor entry [0x00007f6bc2bd6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.security.Provider.getService(Provider.java:1035)
        - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
        at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
        at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
        at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
        at javax.crypto.Cipher.getInstance(Cipher.java:513)
        at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
        at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
        at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
        at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
        at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
        at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
        at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
        at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
        at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
        at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
        at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
        at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
        at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
        - locked <0x0000000083444878> (a java.nio.HeapByteBuffer)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked <0x00000000834448c0> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
{noformat}

and at the end of jstack:
{noformat}
Found one Java-level deadlock:
=============================
"IPC Parameter Sending Thread #29":
  waiting to lock monitor 0x0000000017ff49f8 (object 0x0000000080277040, a sun.security.provider.Sun),
  which is held by UNKNOWN_owner_addr=0x0000000050607000

Java stack information for the threads listed above:
===================================================
"IPC Parameter Sending Thread #29":
        at java.security.Provider.getService(Provider.java:1035)
        - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
        at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
        at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
        at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
        at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
        - locked <0x00000000834386b8> (a java.lang.Object)
        at javax.crypto.SecretKeyFactory.<init>(SecretKeyFactory.java:121)
        at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
        at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
        at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
        at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
        at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
        at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
        at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
        at sun.security.jgss.krb5.WrapToken.<init>(WrapToken.java:422)
        at sun.security.jgss.krb5.Krb5Context.wrap(Krb5Context.java:922)
        at sun.security.jgss.GSSContextImpl.wrap(GSSContextImpl.java:385)
        at com.sun.security.sasl.gsskerb.GssKrb5Base.wrap(GssKrb5Base.java:103)
        at org.apache.hadoop.security.SaslRpcClient$WrappedOutputStream.write(SaslRpcClient.java:636)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x00000000834389a8> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
        - locked <0x00000000834389c0> (a java.io.DataOutputStream)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

After some research, I think this is https://bugs.openjdk.java.net/browse/JDK-7092821 . 
Filing this jira for discussions.

JDK version used was 1.8.0_144.


> IPC client hang in kerberized cluster due to JDK deadlock
> ---------------------------------------------------------
>
>                 Key: HADOOP-15359
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15359
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.6.0, 2.8.0, 3.0.0
>            Reporter: Xiao Chen
>            Priority: Critical
>         Attachments: 1.jstack, 2.jstack
>
>
> In a recent internal testing, we have found a DFS client hang. Further inspecting jstack
shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5
os_prio=0 tid=0x00007f6bb306c000 nid=0x1c76e waiting for monitor entry [0x00007f6bc2bd6000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.security.Provider.getService(Provider.java:1035)
>         - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
>         at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
>         at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
>         at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
>         at javax.crypto.Cipher.getInstance(Cipher.java:513)
>         at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
>         at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
>         at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
>         at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
>         at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
>         at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
>         at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
>         at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
>         at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
>         at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
>         at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
>         at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
>         at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
>         - locked <0x0000000083444878> (a java.nio.HeapByteBuffer)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         - locked <0x00000000834448c0> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x0000000017ff49f8 (object 0x0000000080277040, a sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x0000000050607000
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #29":
>         at java.security.Provider.getService(Provider.java:1035)
>         - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
>         at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
>         at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
>         at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
>         at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
>         - locked <0x00000000834386b8> (a java.lang.Object)
>         at javax.crypto.SecretKeyFactory.<init>(SecretKeyFactory.java:121)
>         at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
>         at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
>         at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
>         at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
>         at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
>         at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
>         at sun.security.jgss.krb5.WrapToken.<init>(WrapToken.java:422)
>         at sun.security.jgss.krb5.Krb5Context.wrap(Krb5Context.java:922)
>         at sun.security.jgss.GSSContextImpl.wrap(GSSContextImpl.java:385)
>         at com.sun.security.sasl.gsskerb.GssKrb5Base.wrap(GssKrb5Base.java:103)
>         at org.apache.hadoop.security.SaslRpcClient$WrappedOutputStream.write(SaslRpcClient.java:636)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         - locked <0x00000000834389a8> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
>         - locked <0x00000000834389c0> (a java.io.DataOutputStream)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After some research, the closest I found is https://bugs.openjdk.java.net/browse/JDK-7092821
. 
> Filing this jira for discussions.
> JDK version used was 1.8.0_144.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message