hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samir Ahmic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13792) Regionserver unable to report to master when master is restarted
Date Mon, 15 Jun 2015 13:44:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585936#comment-14585936
] 

Samir Ahmic commented on HBASE-13792:
-------------------------------------

Root cause of this issue is same as HBASE-13337. 

> Regionserver unable to report to master when master is restarted
> ----------------------------------------------------------------
>
>                 Key: HBASE-13792
>                 URL: https://issues.apache.org/jira/browse/HBASE-13792
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 2.0.0
>         Environment: x86_64 GNU/Linux
>            Reporter: Samir Ahmic
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> I was testing master branch on distributed cluster and i notice that when master is restarted
 on running cluster regionservers are unable report back when master is up again. 
> Things back to normal after i restarted regionservers. Logs showing that regionservers
are correctly detecting master znode.  
> After some digging i notice that we have changed client implementation in RpcClientFactory
to  AsyncRpcClient so i have tried running cluster with previous  RpcClientImpl and issue
was gone. 
> So issue is probably caused by AsyncRpcClient which is unable reconnect to master once
original connection is gone.
> I was able to fix issue by creating new rpcClient object inside HRegionServer#createRegionServerStatusStub()
and using it for channel creation here is diff:
> {code}
> diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> index fa56966..27e658c 100644
> --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> @@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements
>            break;
>          }
>          try {
> +          LOG.info("***Creating new client connection");
> +          rpcClient = RpcClientFactory.createClient(conf, clusterId, new InetSocketAddress(
> +            rpcServices.isa.getAddress(), 0));
>            BlockingRpcChannel channel =
> -            this.rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
> +          rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
>                shortOperationTimeout);
>            intf = RegionServerStatusService.newBlockingStub(channel);
>            break;
> {code}
> If this is acceptable way for fixing this issue i will create and attach patch?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message