Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64C1917947 for ; Fri, 27 Mar 2015 22:38:53 +0000 (UTC) Received: (qmail 96392 invoked by uid 500); 27 Mar 2015 22:38:53 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 96350 invoked by uid 500); 27 Mar 2015 22:38:53 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 96339 invoked by uid 99); 27 Mar 2015 22:38:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2015 22:38:53 +0000 Date: Fri, 27 Mar 2015 22:38:53 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-13317) Region server reportForDuty stuck looping if there is a master change MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13317: ---------------------------------- Fix Version/s: 1.1.0 > Region server reportForDuty stuck looping if there is a master change > --------------------------------------------------------------------- > > Key: HBASE-13317 > URL: https://issues.apache.org/jira/browse/HBASE-13317 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 1.0.0, 2.0.0, 0.98.12 > Reporter: Jerry He > Assignee: Jerry He > Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 > > Attachments: HBASE-13317-0.98-v2.patch, HBASE-13317-0.98-v3.patch, HBASE-13317-0.98.patch > > > During cluster startup, region server reportForDuty gets stuck looping if there is a master change. > {noformat} > 2015-03-22 11:15:16,186 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf274,60000,1427045883965 with port=60020, startcode=1427048115174 > 2015-03-22 11:15:16,272 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up > com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused > at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678) > at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) > at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277) > at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896) > at java.lang.Thread.run(Thread.java:745) > 2015-03-22 11:15:16,274 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. > 2015-03-22 11:15:19,274 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 > 2015-03-22 11:15:19,275 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up > com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused > at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678) > at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) > at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277) > at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896) > at java.lang.Thread.run(Thread.java:745) > 2015-03-22 11:15:19,276 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. > 2015-03-22 11:15:22,276 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 > 2015-03-22 11:15:22,296 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet > 2015-03-22 11:15:22,296 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. > 2015-03-22 11:15:25,296 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 > 2015-03-22 11:15:25,299 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet > 2015-03-22 11:15:25,299 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. > 2015-03-22 11:15:28,299 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 > 2015-03-22 11:15:28,302 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet > 2015-03-22 11:15:28,302 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. > {noformat} > What happended is the region server first got master=bigaperf274,60000,1427045883965. Before it was able to report successfully, the maser changed to bigaperf273,60000,1427048108439. > We were supposed to open a new connection to the new master. But we never did, looping and trying to old address forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)