hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13317) Region server reportForDuty stuck looping if there is a master change
Date Wed, 25 Mar 2015 03:55:53 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379248#comment-14379248
] 

Jerry He commented on HBASE-13317:
----------------------------------

I thought about how to do the unit test today. Need to use HBaseMiniCluser to really test
the scenario.
But need to intervene into the middle of the mini cluster so that I can simulate the change
of master.

> Region server reportForDuty stuck looping if there is a master change
> ---------------------------------------------------------------------
>
>                 Key: HBASE-13317
>                 URL: https://issues.apache.org/jira/browse/HBASE-13317
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.0.0, 2.0.0, 0.98.12
>            Reporter: Jerry He
>            Assignee: Jerry He
>             Fix For: 2.0.0, 1.0.1, 0.98.13
>
>         Attachments: HBASE-13317-0.98-v2.patch, HBASE-13317-0.98.patch
>
>
> During cluster startup, region server reportForDuty gets stuck looping if there is a
master change.
> {noformat}
> 2015-03-22 11:15:16,186 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty
to master=bigaperf274,60000,1427045883965 with port=60020, startcode=1427048115174
> 2015-03-22 11:15:16,272 WARN  [regionserver60020] regionserver.HRegionServer: error telling
master we are up
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
> 	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
> 	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> 	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-03-22 11:15:16,274 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty
failed; sleeping and then retrying.
> 2015-03-22 11:15:19,274 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty
to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
> 2015-03-22 11:15:19,275 WARN  [regionserver60020] regionserver.HRegionServer: error telling
master we are up
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
> 	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
> 	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> 	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-03-22 11:15:19,276 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty
failed; sleeping and then retrying.
> 2015-03-22 11:15:22,276 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty
to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
> 2015-03-22 11:15:22,296 DEBUG [regionserver60020] regionserver.HRegionServer: Master
is not running yet
> 2015-03-22 11:15:22,296 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty
failed; sleeping and then retrying.
> 2015-03-22 11:15:25,296 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty
to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
> 2015-03-22 11:15:25,299 DEBUG [regionserver60020] regionserver.HRegionServer: Master
is not running yet
> 2015-03-22 11:15:25,299 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty
failed; sleeping and then retrying.
> 2015-03-22 11:15:28,299 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty
to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
> 2015-03-22 11:15:28,302 DEBUG [regionserver60020] regionserver.HRegionServer: Master
is not running yet
> 2015-03-22 11:15:28,302 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty
failed; sleeping and then retrying.
> {noformat}
> What happended is the region server first got master=bigaperf274,60000,1427045883965.
 Before it was able to report successfully, the maser changed to bigaperf273,60000,1427048108439.
> We were supposed to open a new connection to the new master. But we never did, looping
and trying to old address forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message