hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Liochon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11714) RpcRetryingCaller#callWithoutRetries set rpc timeout to 2 seconds incorrectly
Date Mon, 11 Aug 2014 09:14:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092592#comment-14092592
] 

Nicolas Liochon commented on HBASE-11714:
-----------------------------------------

Make sense, I've updated description in HBASE-11374 with the error message.

> RpcRetryingCaller#callWithoutRetries set rpc timeout to 2 seconds incorrectly
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-11714
>                 URL: https://issues.apache.org/jira/browse/HBASE-11714
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.98.3
>            Reporter: Qiang Tian
>            Assignee: Qiang Tian
>             Fix For: 0.98.4
>
>         Attachments: hbase-11714-0.98.patch
>
>
> Discussed on the user@hbase mailing list (http://markmail.org/thread/w3cqjxwo2smkn2jw)
> {quote}
> "Recently switched from 0.94 and 0.98, and finding that periodically things
> are having issues - lots of retry exceptions" :
> {quote}
> client log:
> {quote}
> 2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158,
> table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops,
> last exception: java.net.SocketTimeoutException: Call to
> ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed
> because java.net.SocketTimeoutException: 2000 millis timeout while waiting
> for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014
> remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on
> ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking
> started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500
> ops.
> {quote}
> analysis:
> there are 2 methods in RpcRetryingCaller: callWithRetries and callWithoutRetries.
> it looks the timeout setup of callWithRetries is good, while callWithoutRetries is wrong(multi
RPC for this user): caller cannot specify a valid timeout, but callWithoutRetries still calls
beforeCall, which looks a method for callWithRetries only,  to set timeout. since RpcRetryingCaller#callTimeout
 is not set, thread local timeout is set to 2s(MIN_RPC_TIMEOUT) via RpcClient.setRpcTimeout,
which is the final pinginterval set to the socket.
> when there are heavy write workload and the rpc cannot complete in 2s, the client close
the connection, so the server side connection is reset and finally exposes the problem in
HBASE-11705



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message