Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 068A29396 for ; Sun, 25 Sep 2011 19:27:48 +0000 (UTC) Received: (qmail 5228 invoked by uid 500); 25 Sep 2011 19:27:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 5206 invoked by uid 500); 25 Sep 2011 19:27:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 5197 invoked by uid 99); 25 Sep 2011 19:27:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 19:27:47 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 19:27:46 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 55F46B11B4 for ; Sun, 25 Sep 2011 19:27:26 +0000 (UTC) Date: Sun, 25 Sep 2011 19:27:26 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <154578403.10898.1316978846348.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <723160302.561.1316650886239.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114336#comment-13114336 ] Ted Yu commented on HBASE-4455: ------------------------------- If there is no major revision based on version 3, allow me to integrate Tuesday. > Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager > ---------------------------------------------------------------------------------- > > Key: HBASE-4455 > URL: https://issues.apache.org/jira/browse/HBASE-4455 > Project: HBase > Issue Type: Bug > Reporter: Ming Ma > Assignee: Ming Ma > Fix For: 0.92.0 > > > Keep Master up all the time, do rolling restart of RSs like this - stop RS1, wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. regions aren't in "regions in transtion" from AssignmentManager point of view, but they aren't assigned to any regions. Here are the issues. > 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is invoked to check if it contains -ROOT- region. That is due to long delay from ZK notification and async nature of the system. Here is an example, even though new root region server sea-lab-1,60020,1316380133656 is set at T2, at T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location still points to old server sea-lab-3,60020,1316380037898. > T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6 > 0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode /hbase/root-regio > n-server and set watcher; sea-lab-3,60020,1316380037898 > T2: 2011-09-18 14:08:57,173 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as sea-lab-1,60020,1316380133656 > T3: 2011-09-18 14:10:26,393 DEBUG org.apache.hadoop.hbase.master.ServerManager: Adde > d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler to be executed, root=false, meta=true, current Root Location: sea-lab-3,60020,1316380037898 > T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6 > 0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656 > 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or .META. availability could be blocked. If meanwhile, the new server that -ROOT- or .META. is being assigned restarted, another instance of MetaServerShutdownHandler is queued. Eventually, all MetaServerShutdownHandler worker threads are filled up. It looks like HBASE-4245. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira