Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <21619717.32711287777496557.JavaMail.jira@thor>
Date: Fri, 22 Oct 2010 15:58:16 -0400 (EDT)
From: "stack (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Subject: [jira] Resolved: (HBASE-2998) rolling-restart.sh shouldn't rely on
 zoo.cfg
In-Reply-To: <14321588.189621284501285804.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-2998.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Thanks for the review Jon.  I did as you suggested (and that test passes).  I just tried it too up on cluster w/ 5 node ensemble.  Committing.


> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
>                 Key: HBASE-2998
>                 URL: https://issues.apache.org/jira/browse/HBASE-2998
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 2998.txt
>
>
> I tried the rolling-restart script on our dev environment, which is configured with zoo.cfg for zookeeper, and it worked pretty well. Then I tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered some downtime (no biggie tho, nothing critical was running). When the script calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase configuration files. What happens next if ZK isn't running on the master node is that it receives a ConnectionRefused, ignores it, procedes to restart the master (which waits on the znode), and the starts restarting the region servers. They can't shutdown properly under 60 seconds, since they need a master, so they get killed. What follows is pretty ugly and pretty much requires a whole restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.