hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2998) rolling-restart.sh shouldn't rely on zoo.cfg
Date Thu, 21 Oct 2010 00:55:28 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923269#action_12923269
] 

HBase Review Board commented on HBASE-2998:
-------------------------------------------

Message from: "Jonathan Gray" <jgray@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1057/#review1594
-----------------------------------------------------------


Looking good!


trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ShutdownHook.java
<http://review.cloudera.org/r/1057/#comment5394>

    


- Jonathan





> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
>                 Key: HBASE-2998
>                 URL: https://issues.apache.org/jira/browse/HBASE-2998
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 2998.txt
>
>
> I tried the rolling-restart script on our dev environment, which is configured with zoo.cfg
for zookeeper, and it worked pretty well. Then I tried it on our MR cluster, which doesn't
have a zoo.cfg, and we suffered some downtime (no biggie tho, nothing critical was running).
When the script calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase configuration
files. What happens next if ZK isn't running on the master node is that it receives a ConnectionRefused,
ignores it, procedes to restart the master (which waits on the znode), and the starts restarting
the region servers. They can't shutdown properly under 60 seconds, since they need a master,
so they get killed. What follows is pretty ugly and pretty much requires a whole restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message