hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8716) Fixups/Improvements for graceful_stop.sh/region_mover.rb
Date Sat, 08 Jun 2013 21:54:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678866#comment-13678866
] 

stack commented on HBASE-8716:
------------------------------

I tried these changes on cluster and seems to do right thing.  Here is before the change:

{code}


[stack@sss-1 ~]$ ./hbase/bin/graceful_stop.sh --config /home/stack/conf-hbase  x
2013-06-08T14:22:02 Disabling load balancer
2013-06-08T14:22:09 Previous balancer state was false
2013-06-08T14:22:09 Unloading x region(s)
2013-06-08 14:22:14,867 TRACE [main] zookeeper.ZKConfig: Skipped reading ZK properties file
'zoo.cfg' since 'hbase.config.read.zookeeper.config' was not set to true
2013-06-08 14:22:14,907 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090,
built on 09/30/2012 17:52 GMT
2013-06-08 14:22:14,907 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=sss-1.ent.cloudera.com
2013-06-08 14:22:14,907 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.6.0_31
2013-06-08 14:22:14,907 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Sun
Microsystems Inc.
2013-06-08 14:22:14,907 INFO  [main] zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_31/jre

....
2013-06-08 14:22:14,990 INFO  [main-SendThread(sss-1.ent.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to sss-1.ent.cloudera.com/10.20.195.21:2181, initiating session
2013-06-08 14:22:15,049 INFO  [main-SendThread(sss-1.ent.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server sss-1.ent.cloudera.com/10.20.195.21:2181, sessionid
= 0x13ef746f91a0054, negotiated timeout = 90000
RuntimeError: Server x not online
    stripServer at /home/stack/hbase/bin/region_mover.rb:200
  unloadRegions at /home/stack/hbase/bin/region_mover.rb:306
         (root) at /home/stack/hbase/bin/region_mover.rb:456
2013-06-08T14:22:16 Unloaded x region(s)
2013-06-08T14:22:16 Stopping regionserver
x: ssh: Could not resolve hostname x: Name or service not known
[stack@sss-1 ~]$ echo $?
0
{code}

Here is after the change passing -e:

{code}
[stack@sss-1 ~]$ ./hbase/bin/graceful_stop.sh --config /home/stack/conf-hbase -e x
2013-06-08T14:24:10 Disabling load balancer
2013-06-08T14:24:17 Previous balancer state was false
2013-06-08T14:24:17 Unloading x region(s)
2013-06-08 14:24:22,883 TRACE [main] zookeeper.ZKConfig: Skipped reading ZK properties file
'zoo.cfg' since 'hbase.config.read.zookeeper.config' was not set to true
2013-06-08 14:24:22,920 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090,
built on 09/30/2012 17:52 GMT
2013-06-08 14:24:22,920 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=sss-1.ent.cloudera.com
2013-06-08 14:24:22,920 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.6.0_31
...
2013-06-08 14:24:22,949 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x24eff2c
connecting to ZooKeeper ensemble=sss-1.ent.cloudera.com:2181
2013-06-08 14:24:22,964 INFO  [main-SendThread(sss-1.ent.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server sss-1.ent.cloudera.com/10.20.195.21:2181. Will not attempt
to authenticate using SASL (Unable to locate a login configuration)
2013-06-08 14:24:22,974 INFO  [main-SendThread(sss-1.ent.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to sss-1.ent.cloudera.com/10.20.195.21:2181, initiating session
2013-06-08 14:24:23,020 INFO  [main-SendThread(sss-1.ent.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server sss-1.ent.cloudera.com/10.20.195.21:2181, sessionid
= 0x13ef746f91a0057, negotiated timeout = 90000
RuntimeError: Server x not online
    stripServer at /home/stack/hbase/bin/region_mover.rb:200
  unloadRegions at /home/stack/hbase/bin/region_mover.rb:306
         (root) at /home/stack/hbase/bin/region_mover.rb:456
[stack@sss-1 ~]$ echo $?
1
{code}
                
> Fixups/Improvements for graceful_stop.sh/region_mover.rb
> --------------------------------------------------------
>
>                 Key: HBASE-8716
>                 URL: https://issues.apache.org/jira/browse/HBASE-8716
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: stack
>         Attachments: 8716.txt
>
>
> It is a while since these scripts were touched.  Giving them a spring cleaning and seeing
if can make them return error codes on failure (seems like style previous was that the operator
would watch the output and react to it but I see cases where tools want to call these scripts
and they want return code to indicate whether the rolling upgrade worked or not).  Also, see
if can make the rolling restart faster since one-by-one while minimally disruptive and 'safe',
it is slow one clusters of hundreds of nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message