Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C04585B5 for ; Sat, 17 Sep 2011 20:37:33 +0000 (UTC) Received: (qmail 75288 invoked by uid 500); 17 Sep 2011 20:37:33 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 75258 invoked by uid 500); 17 Sep 2011 20:37:33 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 75249 invoked by uid 99); 17 Sep 2011 20:37:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Sep 2011 20:37:33 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Sep 2011 20:37:30 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id DF0539F0A0 for ; Sat, 17 Sep 2011 20:37:08 +0000 (UTC) Date: Sat, 17 Sep 2011 20:37:08 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <764079664.39023.1316291828910.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1702306109.25407.1315978568948.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4400) .META. getting stuck if RS hosting it is dead and znode state is in RS_ZK_REGION_OPENED MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107246#comment-13107246 ] Ted Yu commented on HBASE-4400: ------------------------------- Integrated to TRUNK and branch. Thanks for the patches Ramkrishna. Thanks for the review Michael. > .META. getting stuck if RS hosting it is dead and znode state is in RS_ZK_REGION_OPENED > --------------------------------------------------------------------------------------- > > Key: HBASE-4400 > URL: https://issues.apache.org/jira/browse/HBASE-4400 > Project: HBase > Issue Type: Bug > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0, 0.90.5 > > Attachments: HBASE-4400_0.90.patch, HBASE-4400_0.90_1.patch, HBASE-4400_trunk.patch, HBASE-4400_trunk_1.patch > > > Start 2 RS. > The .META. is being hosted by RS2 but while processing it goes down. > Now restart the master and RS1. Master gets the RS name from the znode in RS_ZK_REGION_OPENED. But as RS2 is not online still the master is not able to process the META at all. Please find the logs > {noformat} > 2011-09-14 16:43:51,949 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1315998828523, region=70236052/-ROOT- > 2011-09-14 16:43:51,968 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- assigned=1, rit=false, location=linux76:60020 > 2011-09-14 16:43:51,970 INFO org.apache.hadoop.hbase.master.AssignmentManager: Processing region .META.,,1.1028785192 in state RS_ZK_REGION_OPENED > 2011-09-14 16:43:51,970 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed to find linux146,60020,1315998414623 in list of online servers; skipping registration of open of .META.,,1.1028785192 > 2011-09-14 16:43:51,971 INFO org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 1028785192/.META. > 2011-09-14 16:43:51,983 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux76,60020,1315998828523, region=70236052/-ROOT- > 2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 70236052; deleting unassigned node > 2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x13267854032001d Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED > 2011-09-14 16:43:51,998 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x13267854032001d Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED > 2011-09-14 16:43:51,999 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region -ROOT-,,0.70236052 on linux76,60020,1315998828523 > 2011-09-14 16:44:00,945 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=linux146,60020,1315998839724, regionCount=0, userLoad=false > 2011-09-14 16:46:20,003 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPEN, ts=0 > 2011-09-14 16:46:20,004 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything > {noformat} > {code} > regionsInTransition.put(encodedRegionName, new RegionState( > regionInfo, RegionState.State.OPEN, data.getStamp())); > ................ > } else { > HServerInfo hsi = this.serverManager.getServerInfo(sn); > if (hsi == null) { > LOG.info("Failed to find " + sn + > " in list of online servers; skipping registration of open of " + > regionInfo.getRegionNameAsString()); > } else { > new OpenedRegionHandler(master, this, regionInfo, hsi).process(); > } > } > {code} > So timeout monitor is not able to do anything here > {code} > LOG.error("Region has been OPEN for too long, " + > "we don't know where region was opened so can't do anything"); > synchronized(regionState) { > regionState.update(regionState.getState()); > } > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira