hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15538) Possible deadlock in Client
Date Thu, 14 Jun 2018 08:11:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512135#comment-16512135
] 

Yongjun Zhang commented on HADOOP-15538:
----------------------------------------

All other 6 BLOCKED threads (there are 8 BLOCKED threads in total in each jstack frame) look
like
{code:java}
"Thread-52" #82 prio=5 os_prio=0 tid=0x000000001a2c1000 nid=0xf189f waiting for monitor entry
[0x00007f697a77f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1053)
        - waiting to lock <0x00000006215c1e08> (a java.lang.Object)
        at org.apache.hadoop.ipc.Client.call(Client.java:1483)
        at org.apache.hadoop.ipc.Client.call(Client.java:1441)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:266)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1323)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1310)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1298)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275)
        - locked <0x0000000618947838> (a java.lang.Object)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:267)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1629)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:334)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
{code}
These threads all blocked due to RPC serialization at
{code:java}
    public void sendRpcRequest(final Call call)
        throws InterruptedException, IOException {
      if (shouldCloseConnection.get()) {
        return;
      }

      // Serialize the call to be sent. This is done from the actual
      // caller thread, rather than the sendParamsExecutor thread,
      
      // so that if the serialization throws an error, it is reported
      // properly. This also parallelizes the serialization.
      //
      // Format of a call on the wire:
      // 0) Length of rest below (1 + 2)
      // 1) RpcRequestHeader  - is serialized Delimited hence contains length
      // 2) RpcRequest
      //
      // Items '1' and '2' are prepared here. 
      final DataOutputBuffer d = new DataOutputBuffer();
      RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
          call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
          clientId);
      header.writeDelimitedTo(d);
      call.rpcRequest.write(d);

      synchronized (sendRpcRequestLock) { <===================Client.java, line 1053
        Future<?> senderFuture = sendParamsExecutor.submit(new Runnable() {
{code}

These 6 threads are all blocked due to the single RPC thread holding the sendRpcRequestLock
("IPC Parameter Sending Thread #294" in the jira description), which is blocked due to the
reported deadlock.

So what is the "UNKNOWN_owner_addr=0x00007f68332e2800" thread that caused the "deadlock" is
hidden.

 

> Possible deadlock in Client
> ---------------------------
>
>                 Key: HADOOP-15538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15538
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>            Reporter: Yongjun Zhang
>            Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. And for
each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #294":
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         - locked <0x0000000621745380> (a java.lang.Object)
>         at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
>         - locked <0x000000062174b878> (a java.io.DataOutputStream)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =============================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>         at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
>         - locked <0x0000000621745370> (a java.lang.Object)
>         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.
> The first thread is blocked at:
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268
> The second thread is blocked at:
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279
> There are two issues here:
> # There seems to be a real deadlock because the stacks remain the same even if the first
an last jstack frames captured is 13 minutes apart.
> # Java deadlock report seems to be problematic, two threads that have deadlock should
not be blocked on the same lock, but they appear to be in this case: the same SocketChannelImpl's
stateLock.
> I found a relevant jdk jira https://bugs.openjdk.java.net/browse/JDK-8007476, it explains
where two deadlocks are reported and they are really for the same deadlock. 
> I don't see a similar report about this issue in jdk jira database, and I'm thinking
about filing a jdk jira for that, but would like to throw some discussion here before that.
> Issue#1 is important, because the client is hanging, which indicate a real problem; however,
without a correct report before issue#2 is fixed, it's not clear how the deadlock really looks
like.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message