hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13480) ShortCircuitConnection doesn't short-circuit all calls as expected
Date Sat, 22 Aug 2015 03:34:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707816#comment-14707816

stack commented on HBASE-13480:

This is a particular problem in trunk where master serves hbase:meta. TestDistributedLogReplay
has priority handlers pumped up to 40 which is kinda crazy. If I set the number way down,
to 5 say, then the cluster locks up because the priority handlers are all occupied doing reportRegionStateTransition
which wants to RPC back into the meta table... only the priority handlers are all occupied
(with long timeouts as per [~elserj] above) so we can't progress.

We need to address the larger issue of cluster deadlock but the short-circuit fix here should
help w/ current state of trunk at least.

Testing this patch, there is big improvement in TestDistributedLogReplay. Just a few timeout/retries
in logs as opposed to logs filled with them when handler count is 5 AND it passes as opposed
to hangs.

Reviewing the patch, the only problem I have is that both short-circuit and RPC connections
are hosted inside a class named for short circuiting which seems incorrect. Internally it
can do the switch but the hosting class that figures whether to rpc or go short-circuit shoudn't
be called short-circuit; it could even be an anonymous inner class if we have trouble coming
up w/ a good name.

Good one [~elserj] and [~jingcheng.du@intel.com]

> ShortCircuitConnection doesn't short-circuit all calls as expected
> ------------------------------------------------------------------
>                 Key: HBASE-13480
>                 URL: https://issues.apache.org/jira/browse/HBASE-13480
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.0, 2.0.0, 1.1.0
>            Reporter: Josh Elser
>            Assignee: Jingcheng Du
>             Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3
>         Attachments: HBASE-13480-1.patch, HBASE-13480.patch
> Noticed the following situation in debugging unexpected unit tests failures in HBASE-13351.
> {{ConnectionUtils#createShortCircuitHConnection(Connection, ServerName, AdminService.BlockingInterface,
ClientService.BlockingInterface)}} is intended to avoid the extra RPC by calling the server's
instantiation of the protobuf rpc stub directly for the AdminService and ClientService.
> The problem is that this is insufficient to actually avoid extra "remote" RPCs as all
other calls to the Connection are routed to a "real" Connection instance. As such, any object
created by the "real" Connection (such as an HTable) will use the real Connection, not the
> The end result is that {{MasterRpcService#reportRegionStateTransition(RpcController,
ReportRegionStateTransitionRequest)}} will make additional "remote" RPCs over what it thinks
is an SSC through a {{Get}} on {{HTable}} which was constructed using the SSC, but the {{Get}}
itself will use the underlying real Connection instead of the SSC. With insufficiently sized
thread pools, this has been observed to result in RPC deadlock in the HMaster where an RPC
attempts to make another RPC but there are no more threads available to service the second
RPC so the first RPC blocks indefinitely.

This message was sent by Atlassian JIRA

View raw message