hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samir Ahmic (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13792) Regionserver unable to report to master when master is restarted
Date Thu, 28 May 2015 12:24:22 GMT
Samir Ahmic created HBASE-13792:

             Summary: Regionserver unable to report to master when master is restarted
                 Key: HBASE-13792
                 URL: https://issues.apache.org/jira/browse/HBASE-13792
             Project: HBase
          Issue Type: Bug
          Components: IPC/RPC
    Affects Versions: 2.0.0
         Environment: x86_64 GNU/Linux
            Reporter: Samir Ahmic
            Priority: Critical
             Fix For: 2.0.0

I was testing master branch on distributed cluster and i notice that when master is restarted
 on running cluster regionservers are unable report back when master is up again. 
Things back to normal after i restarted regionservers. Logs showing that regionservers are
correctly detecting master znode.  
After some digging i notice that we have changed client implementation in RpcClientFactory
to  AsyncRpcClient so i have tried running cluster with previous  RpcClientImpl and issue
was gone. 
So issue is probably caused by AsyncRpcClient which is unable reconnect to master once original
connection is gone.
I was able to fix issue by creating new rpcClient object inside HRegionServer#createRegionServerStatusStub()
and using it for channel creation here is diff:
diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
index fa56966..27e658c 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
@@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements
         try {
+          LOG.info("***Creating new client connection");
+          rpcClient = RpcClientFactory.createClient(conf, clusterId, new InetSocketAddress(
+            rpcServices.isa.getAddress(), 0));
           BlockingRpcChannel channel =
-            this.rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
+          rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
           intf = RegionServerStatusService.newBlockingStub(channel);

If this is acceptable way for fixing this issue i will create and attach patch?

This message was sent by Atlassian JIRA

View raw message