hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guangxu Cheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17798) RpcServer.Listener.Reader can abort due to CancelledKeyException
Date Sat, 18 Mar 2017 10:26:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931167#comment-15931167
] 

Guangxu Cheng commented on HBASE-17798:
---------------------------------------

{quote}
Did you put the proposed change in your cluster ?
If so, what effect do you observe ?
{quote}
Before fix, there are two effects:
1. A large number of requests are not processed and not closed due to the reader abort. And
the number of TCP connections in the ESTABLISHED state of the RS is increasing as the picture(connections.png)
shows.
2. The client has many SocketTimeoutException.
After fix, these problems do not exist.


> RpcServer.Listener.Reader can abort due to CancelledKeyException
> ----------------------------------------------------------------
>
>                 Key: HBASE-17798
>                 URL: https://issues.apache.org/jira/browse/HBASE-17798
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.3.0, 1.2.4, 0.98.24
>            Reporter: Guangxu Cheng
>         Attachments: connections.png, HBASE-17798-0.98-v1.patch, HBASE-17798-0.98-v2.patch,
HBASE-17798-branch-1-v1.patch, HBASE-17798-branch-1-v2.patch, HBASE-17798-master-v1.patch,
HBASE-17798-master-v2.patch
>
>
> In our production cluster(0.98), some of the requests were unacceptable because RpcServer.Listener.Reader
were aborted.
> getReader() will return the next reader to deal with request.
> The implementation of getReader() as below´╝Ü
> {code:title=RpcServer.java|borderStyle=solid}
>     // The method that will return the next reader to work with
>     // Simplistic implementation of round robin for now
>     Reader getReader() {
>       currentReader = (currentReader + 1) % readers.length;
>       return readers[currentReader];
>     }
> {code}
> If one of the readers abort, then it will lead to fall on the reader's request will never
be dealt with.
> Why does RpcServer.Listener.Reader abort?We add the debug log to get it.
> After a while, we got the following exception:
> {code}
> 2017-03-10 08:05:13,247 ERROR [RpcServer.reader=3,port=60020] ipc.RpcServer: RpcServer.listener,port=60020:
unexpectedly error in Reader(Throwable)
> java.nio.channels.CancelledKeyException
>         at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>         at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
>         at java.nio.channels.SelectionKey.isReadable(SelectionKey.java:289)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:592)
>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:566)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> So, when deal with the request in reader, we should handle CanceledKeyException.
> ----------
> versions 1.x and 2.0 will log and retrun when dealing with the InterruptedException in
Reader#doRunLoop after HBASE-10521. It will lead to the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message