hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-15538) Possible RPC deadlock in Client
Date Fri, 15 Jun 2018 21:59:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514402#comment-16514402
] 

Yongjun Zhang edited comment on HADOOP-15538 at 6/15/18 9:58 PM:
-----------------------------------------------------------------

pstack for thread
{code:java}
"IPC Parameter Sending Thread #294" #142281 daemon prio=5 os_prio=0 tid=0x00007f676e2b7800
nid=0x1bfe75 waiting for monitor entry [0x00007f6920ed4000]
{code}
is
{code:java}
Thread 6499 (Thread 0x7f6920ed5700 (LWP 1834613)):
#0 0x000000396ce0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f69f16c3c23 in os::PlatformEvent::park() () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#2 0x00007f69f16b2bad in ObjectMonitor::EnterI(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#3 0x00007f69f16b4b81 in ObjectMonitor::enter(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#4 0x00007f69f1769430 in SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)
() from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#5 0x00007f69dedaadc8 in ?? ()
#6 0x0000000630aa0dd0 in ?? ()
#7 0x00007f69dfa61464 in ?? ()
#8 0x00000006217452d0 in ?? ()
#9 0xc42e8a70000294c8 in ?? ()
#10 0x0000000621745380 in ?? ()
#11 0x0000000621745390 in ?? ()
#12 0x00007f676e2b7800 in ?? ()
#13 0x00007f69f1d83790 in Universe::_heap_used_at_last_gc () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#14 0x00007f67c42e8a70 in ?? ()
#15 0x00000000c42e8a72 in ?? ()
#16 0x0000000630aa0dd0 in ?? ()
#17 0x00007f69f17601b0 in OptoRuntime::new_instance_C(Klass*, JavaThread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
{code}
 
 The pstack for 
  
{code:java}
"IPC Client (297602875) connection to x.y.z.p:8020 from impala" #142248 daemon prio=5 os_prio=0
tid=0x00007f6920ed6000 nid=0x1bc4ee waiting for monitor entry [0x00007f6961246000]
{code}
is
{code:java}
Thread 9778 (Thread 0x7f6961247700 (LWP 1819886)):
#0  0x000000396ce0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f69f16c8dbf in os::PlatformEvent::park(long) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#2  0x00007f69f16b2a85 in ObjectMonitor::EnterI(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#3  0x00007f69f16b4b81 in ObjectMonitor::enter(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#4  0x00007f69f1769430 in SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*,
JavaThread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#5  0x00007f69dedaadc8 in ?? ()
#6  0x00000006217452d0 in ?? ()
#7  0x00007f69e010080c in ?? ()
#8  0x0000060600000606 in ?? ()
#9  0xc42e8a72c42e8a72 in ?? ()
#10 0x0000000621745390 in ?? ()
#11 0x0000000621745370 in ?? ()
#12 0x0000000621745390 in ?? ()
#13 0xc42e8a6e00000606 in ?? ()
#14 0x0000000600000606 in ?? ()
#15 0x000000062d2c5b80 in ?? ()
#16 0x00000000806f8230 in ?? ()
#17 0x0000000000000007 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000001 in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000000000000003 in ?? ()
#22 0x0000000000000003 in ?? ()
#23 0x0000000000045044 in ?? ()
#24 0x00000006217455c0 in ?? ()
#25 0x00007f69dfa7cd68 in ?? ()
#26 0x00000006217455c0 in ?? ()
#27 0x0000000100000001 in ?? ()
#28 0x000000062d2c5b80 in ?? ()
#29 0x00000001002a6c60 in ?? ()
#30 0x000000062d2c5bb0 in ?? ()
#31 0x00000001100df044 in ?? ()
#32 0x00000000806f8220 in ?? ()
#33 0x0000000000000007 in ?? ()
#34 0x00000163ce93a3ba in ?? ()
#35 0x00000006217452d0 in ?? ()
#36 0x000000062d2c5bb0 in ?? ()
#37 0x00007f69f122422a in CollectedHeap::fill_with_object(HeapWord*, unsigned long, bool)
() from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
{code}


was (Author: yzhangal):
pstack for thread
{code}
"IPC Parameter Sending Thread #294" #142281 daemon prio=5 os_prio=0 tid=0x00007f676e2b7800
nid=0x1bfe75 waiting for monitor entry [0x00007f6920ed4000]
{code}
is
{code:java}
Thread 6499 (Thread 0x7f6920ed5700 (LWP 1834613)):
#0 0x000000396ce0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f69f16c3c23 in os::PlatformEvent::park() () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#2 0x00007f69f16b2bad in ObjectMonitor::EnterI(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#3 0x00007f69f16b4b81 in ObjectMonitor::enter(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#4 0x00007f69f1769430 in SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)
() from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#5 0x00007f69dedaadc8 in ?? ()
#6 0x0000000630aa0dd0 in ?? ()
#7 0x00007f69dfa61464 in ?? ()
#8 0x00000006217452d0 in ?? ()
#9 0xc42e8a70000294c8 in ?? ()
#10 0x0000000621745380 in ?? ()
#11 0x0000000621745390 in ?? ()
#12 0x00007f676e2b7800 in ?? ()
#13 0x00007f69f1d83790 in Universe::_heap_used_at_last_gc () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#14 0x00007f67c42e8a70 in ?? ()
#15 0x00000000c42e8a72 in ?? ()
#16 0x0000000630aa0dd0 in ?? ()
#17 0x00007f69f17601b0 in OptoRuntime::new_instance_C(Klass*, JavaThread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
{code}
 
The pstack for 
 {code}
"IPC Client (297602875) connection to atl1c1r2name01.vldb-bo.secureworks.net/10.236.236.9:8020
from impala" #142248 daemon prio=5 os_prio=0 tid=0x00007f6920ed6000 nid=0x1bc4ee waiting for
monitor entry [0x00007f6961246000]
{code}
is
{code}
Thread 9778 (Thread 0x7f6961247700 (LWP 1819886)):
#0  0x000000396ce0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f69f16c8dbf in os::PlatformEvent::park(long) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#2  0x00007f69f16b2a85 in ObjectMonitor::EnterI(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#3  0x00007f69f16b4b81 in ObjectMonitor::enter(Thread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#4  0x00007f69f1769430 in SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*,
JavaThread*) () from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
#5  0x00007f69dedaadc8 in ?? ()
#6  0x00000006217452d0 in ?? ()
#7  0x00007f69e010080c in ?? ()
#8  0x0000060600000606 in ?? ()
#9  0xc42e8a72c42e8a72 in ?? ()
#10 0x0000000621745390 in ?? ()
#11 0x0000000621745370 in ?? ()
#12 0x0000000621745390 in ?? ()
#13 0xc42e8a6e00000606 in ?? ()
#14 0x0000000600000606 in ?? ()
#15 0x000000062d2c5b80 in ?? ()
#16 0x00000000806f8230 in ?? ()
#17 0x0000000000000007 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000001 in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000000000000003 in ?? ()
#22 0x0000000000000003 in ?? ()
#23 0x0000000000045044 in ?? ()
#24 0x00000006217455c0 in ?? ()
#25 0x00007f69dfa7cd68 in ?? ()
#26 0x00000006217455c0 in ?? ()
#27 0x0000000100000001 in ?? ()
#28 0x000000062d2c5b80 in ?? ()
#29 0x00000001002a6c60 in ?? ()
#30 0x000000062d2c5bb0 in ?? ()
#31 0x00000001100df044 in ?? ()
#32 0x00000000806f8220 in ?? ()
#33 0x0000000000000007 in ?? ()
#34 0x00000163ce93a3ba in ?? ()
#35 0x00000006217452d0 in ?? ()
#36 0x000000062d2c5bb0 in ?? ()
#37 0x00007f69f122422a in CollectedHeap::fill_with_object(HeapWord*, unsigned long, bool)
() from /usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
{code}


> Possible RPC deadlock in Client
> -------------------------------
>
>                 Key: HADOOP-15538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15538
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>            Reporter: Yongjun Zhang
>            Priority: Major
>         Attachments: t1+13min.jstack, t1.jstack
>
>
> We have a jstack collection that spans 13 minutes. One frame per ~1.5 minutes. And for
each of the frame, I observed the following:
> {code:java}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #294":
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         - locked <0x0000000621745380> (a java.lang.Object)
>         at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
>         - locked <0x000000062174b878> (a java.io.DataOutputStream)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =============================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>         at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
>         - locked <0x0000000621745370> (a java.lang.Object)
>         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java].
> The first thread is blocked at:
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268]
> The second thread is blocked at:
>  [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279]
> There are two issues here:
>  # There seems to be a real deadlock because the stacks remain the same even if the first
an last jstack frames captured is 13 minutes apart.
>  # Java deadlock report seems to be problematic, two threads that have deadlock should
not be blocked on the same lock, but they appear to be in this case: the same SocketChannelImpl's
stateLock.
> I found a relevant jdk jira [https://bugs.openjdk.java.net/browse/JDK-8007476], it explains
where two deadlocks are reported and they are really for the same deadlock.
> I don't see a similar report about this issue in jdk jira database, and I'm thinking
about filing a jdk jira for that, but would like to throw some discussion here before that.
> Issue#1 is important, because the client is hanging, which indicate a real problem; however,
without a correct report before issue#2 is fixed, it's not clear how the deadlock really looks
like.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message