hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
Date Tue, 14 Jul 2015 06:19:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625902#comment-14625902
] 

Victor Xu commented on HBASE-14062:
-----------------------------------

We can see from the rs log that META table located on that rs. I guess maybe some applications
use very short client rpc timeout or have requests cached locally before actually sending
to this rs, and when the requests reach the rs, they almost exceed the timeout immediately.
When the clients retry, this request-and-fail loop continues. This could happen when some
big job (tens of thousands of maps using TableInputFormat) starts.

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> ------------------------------------------------------------
>
>                 Key: HBASE-14062
>                 URL: https://issues.apache.org/jira/browse/HBASE-14062
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.98.12
>            Reporter: Victor Xu
>         Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x00007f158097b800 nid=0x2cd05 waiting
for monitor entry [0x0000000046374000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
>         - waiting to lock <0x00000002bb094ac8> (a java.util.Collections$SynchronizedList)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x00007f1580394000 nid=0x2cc19 runnable
[0x0000000043b4c000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.LinkedList.remove(LinkedList.java:363)
>         at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
>         - locked <0x00000002bb094ac8> (a java.util.Collections$SynchronizedList)
>         at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
>         - locked <0x00000002bb094ac8> (a java.util.Collections$SynchronizedList)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
>         - locked <0x00000002bae09a30> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it happens. It
seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message