Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70A479966 for ; Mon, 23 Apr 2012 20:42:57 +0000 (UTC) Received: (qmail 52702 invoked by uid 500); 23 Apr 2012 20:42:57 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 52661 invoked by uid 500); 23 Apr 2012 20:42:57 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 52627 invoked by uid 99); 23 Apr 2012 20:42:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Apr 2012 20:42:56 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Apr 2012 20:42:54 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6528D40BAF1 for ; Mon, 23 Apr 2012 20:42:33 +0000 (UTC) Date: Mon, 23 Apr 2012 20:42:33 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: issues@hbase.apache.org Message-ID: <356474203.6507.1335213753415.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <584055330.915.1334972612554.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: --------------------------------- Attachment: HBASE-5849_v1.patch Attaching a simple patch. Applies to trunk, 0.92 and 0.94 branches. Tested this with pseudo-distributed setup on my laptop, by first launching regionserver, and observing that it does actually wait for the master to boot up, instead of aborting. I'll try to come up with a boot order unit test shortly. > On first cluster startup, RS aborts if root znode is not available > ------------------------------------------------------------------ > > Key: HBASE-5849 > URL: https://issues.apache.org/jira/browse/HBASE-5849 > Project: HBase > Issue Type: Bug > Components: master, regionserver, zookeeper > Affects Versions: 0.92.2, 0.96.0, 0.94.1 > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Attachments: HBASE-5849_v1.patch > > > When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. > Master startup code is smt like this: > - establish zk connection > - create root znodes in zk (/hbase) > - create ephemeral node for master /hbase/master, > Region server start up code is smt like this: > - establish zk connection > - check whether the root znode (/hbase) is there. If not, shutdown. > - wait for the master to create znodes /hbase/master > So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira