hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-15538) Possible deadlock in Client
Date Thu, 14 Jun 2018 07:14:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yongjun Zhang updated HADOOP-15538:
-----------------------------------
    Description: 
We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. And for each
of the frame, I observed the following:

{code}
Found one Java-level deadlock:
=============================
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x00007f68332e2800

Java stack information for the threads listed above:
===================================================
"IPC Parameter Sending Thread #294":
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
        - waiting to lock <0x0000000621745390> (a java.lang.Object)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
        - locked <0x0000000621745380> (a java.lang.Object)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
        - locked <0x000000062174b878> (a java.io.DataOutputStream)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=============================
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x00007f68332e2800

Java stack information for the threads listed above:
===================================================
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
        at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
        - waiting to lock <0x0000000621745390> (a java.lang.Object)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
        - locked <0x0000000621745370> (a java.lang.Object)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

Found 2 deadlocks.
{code}

This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.

The code appears to match https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.

The first thread is blocked at:

https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268

The second thread is blocked at:
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279

There are two issues here:

# There seems to be a real deadlock because the stacks remain the same even if the first an
last jstack frames captured is 13 minutes apart.
# Java deadlock report seems to be problematic, two threads that have deadlock should not
be blocked on the same lock, but they appear to be in this case: the same SocketChannelImpl's
stateLock.

I found a relevant jdk jira https://bugs.openjdk.java.net/browse/JDK-8007476, it explains
where two deadlocks are reported and they are really for the same deadlock. 

I don't see a similar report about this issue in jdk jira database, and I'm thinking about
filing a jdk jira for that, but would like to throw some discussion here before that.

Issue#1 is important, because the client is hanging, which indicate a real problem; however,
without a correct report before issue#2 is fixed, it's not clear how the deadlock really looks
like.

Thanks.



  was:
We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. And for each
of the frame, I observed the following:

{code}
Found one Java-level deadlock:
=============================
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x00007f68332e2800

Java stack information for the threads listed above:
===================================================
"IPC Parameter Sending Thread #294":
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
        - waiting to lock <0x0000000621745390> (a java.lang.Object)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
        - locked <0x0000000621745380> (a java.lang.Object)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
        - locked <0x000000062174b878> (a java.io.DataOutputStream)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=============================
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x00007f68332e2800

Java stack information for the threads listed above:
===================================================
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
        at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
        - waiting to lock <0x0000000621745390> (a java.lang.Object)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
        - locked <0x0000000621745370> (a java.lang.Object)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

Found 2 deadlocks.
{code}

This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.

The code appears to match https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.

The first thread is blocked at:

https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268

The second thread is blocked at:
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279

There are two issues here:

1. There seems to be a real deadlock because the stacks remain the same even if the first
an last jstack frames captured is 13 minutes apart.

2. java deadlock report seems to be problematic, two threads that have deadlock should not
be blocked on the same lock, but they appear to be in this case: the same SocketChannelImpl's
stateLock.

I found a relevant jdk jira https://bugs.openjdk.java.net/browse/JDK-8007476, it explains
where two deadlocks are reported but they are really for the same deadlock. 

I don't see a similar report about this in jdk jira database, and I'm thinking about filing
a jdk jira for that, but would like to throw some discussion here before that.

Issue#1 is important, because the client is hanging, which indicate a real problem; however,
without a correct report before issue#2 is fixed, it's not clear how the deadlock really looks
like.






> Possible deadlock in Client
> ---------------------------
>
>                 Key: HADOOP-15538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15538
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>            Reporter: Yongjun Zhang
>            Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. And for
each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #294":
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         - locked <0x0000000621745380> (a java.lang.Object)
>         at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
>         - locked <0x000000062174b878> (a java.io.DataOutputStream)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =============================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>         at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
>         - locked <0x0000000621745370> (a java.lang.Object)
>         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.
> The first thread is blocked at:
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268
> The second thread is blocked at:
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279
> There are two issues here:
> # There seems to be a real deadlock because the stacks remain the same even if the first
an last jstack frames captured is 13 minutes apart.
> # Java deadlock report seems to be problematic, two threads that have deadlock should
not be blocked on the same lock, but they appear to be in this case: the same SocketChannelImpl's
stateLock.
> I found a relevant jdk jira https://bugs.openjdk.java.net/browse/JDK-8007476, it explains
where two deadlocks are reported and they are really for the same deadlock. 
> I don't see a similar report about this issue in jdk jira database, and I'm thinking
about filing a jdk jira for that, but would like to throw some discussion here before that.
> Issue#1 is important, because the client is hanging, which indicate a real problem; however,
without a correct report before issue#2 is fixed, it's not clear how the deadlock really looks
like.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message