hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Marc Spaggiari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8974) bin/rolling-restart.sh restarts all active RS's with each iteration instead of one at a time
Date Sun, 28 Jul 2013 14:23:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721953#comment-13721953
] 

Jean-Marc Spaggiari commented on HBASE-8974:
--------------------------------------------

I tailed the RS logs over a restart and there is only one restart displayed:
{code}
dimanche 28 juillet 2013, 09:17:02 (UTC-0400) Terminating regionserver
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener
on 60020
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 5 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 2 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 0 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2
on 60020: exiting
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 3 on
60020: exiting
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0
on 60020: exiting
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1
on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 1 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 4 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020:
exiting
2013-07-28 09:17:02,209 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60030
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on
60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020:
exiting
2013-07-28 09:17:02,312 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Closed zookeeper sessionid=0x3400251e47305dc
dimanche 28 juillet 2013, 09:17:03 (UTC-0400) Starting regionserver on node3
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 93921
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 93921
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
2013-07-28 09:17:03,676 INFO org.apache.hadoop.hbase.util.VersionInfo: HBase 0.94.10
2013-07-28 09:17:03,676 INFO org.apache.hadoop.hbase.util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/tags/0.94.10RC0
-r 1504995
2013-07-28 09:17:03,676 INFO org.apache.hadoop.hbase.util.VersionInfo: Compiled by jenkins
on Fri Jul 19 20:24:16 UTC 2013
2013-07-28 09:17:03,778 INFO org.apache.hadoop.hbase.util.ServerCommandLine: vmName=Java HotSpot(TM)
64-Bit Server VM, vmVendor=Oracle Corporation, vmVersion=23.1-b03
2013-07-28 09:17:03,778 INFO org.apache.hadoop.hbase.util.ServerCommandLine: vmInputArguments=[-XX:OnOutOfMemoryError=kill
-9 %p, -Xmx6196m, -XX:+UseConcMarkSweepGC, -XX:+UseConcMarkSweepGC, -Dhbase.log.dir=/home/hbase/hbase-0.94.3/bin/../logs,
-Dhbase.log.file=hbase-hbase-regionserver-node3.log, -Dhbase.home.dir=/home/hbase/hbase-0.94.3/bin/..,
-Dhbase.id.str=hbase, -Dhbase.root.logger=INFO,DRFA, -Djava.library.path=/home/hbase/hbase-0.94.3/bin/../lib/native/Linux-amd64-64,
-Dhbase.security.logger=INFO,DRFAS]
2013-07-28 09:17:03,998 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:03,998 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:03,999 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:03,999 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,000 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,000 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,001 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,002 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,002 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,002 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-0
2013-07-28 09:17:04,009 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: Initializing RPC
Metrics with hostName=HRegionServer, port=60020
2013-07-28 09:17:04,106 INFO org.apache.hadoop.hbase.io.hfile.CacheConfig: Allocating LruBlockCache
with maximum size 2,4g
2013-07-28 09:17:04,316 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
getDefaultBlockSize
2013-07-28 09:17:04,329 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
getDefaultReplication
2013-07-28 09:17:04,339 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
getDefaultReplication
2013-07-28 09:17:04,339 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
getDefaultBlockSize
2013-07-28 09:17:04,393 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=RegionServer, sessionId=regionserver60020
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: revision
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: hdfsUser
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: hdfsDate
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: hdfsUrl
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: date
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: hdfsRevision
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: user
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: hdfsVersion
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: url
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString added: version
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2013-07-28 09:17:04,414 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2013-07-28 09:17:04,444 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog
2013-07-28 09:17:04,476 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-07-28 09:17:04,480 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort()
before open() is -1. Opening the listener on 60030
2013-07-28 09:17:04,480 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned
60030 webServer.getConnectors()[0].getLocalPort() returned 60030
2013-07-28 09:17:04,480 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 60030
2013-07-28 09:17:04,480 INFO org.mortbay.log: jetty-6.1.26
2013-07-28 09:17:04,750 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:60030
2013-07-28 09:17:04,751 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server Responder: starting
2013-07-28 09:17:04,754 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020:
starting
2013-07-28 09:17:04,767 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020:
starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020:
starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 0 on
60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 1 on
60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 2 on
60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 3 on
60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 4 on
60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 5 on
60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on
60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on
60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on
60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on
60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0
on 60020: starting
2013-07-28 09:17:04,775 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1
on 60020: starting
2013-07-28 09:17:04,775 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2
on 60020: starting
2013-07-28 09:17:07,197 ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics:
Inconsistent configuration. Previous configuration for using table name in metrics: true,
new configuration: false
2013-07-28 09:17:07,202 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum can use java.util.zip.CRC32
2013-07-28 09:17:28,700 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native
library is available
2013-07-28 09:17:28,701 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
2013-07-28 09:17:28,701 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native
library loaded
2013-07-28 09:17:28,702 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
2013-07-28 09:17:28,715 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
2013-07-28 09:17:31,776 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
{code}

That's all what I got over the entire rolling-restart. So from the RS side, seems that it's
not restarted more than one.

[~ndimiduk] can you take a look at your RS logs too to see if it matches what you are seeing?


                
> bin/rolling-restart.sh restarts all active RS's with each iteration instead of one at
a time
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8974
>                 URL: https://issues.apache.org/jira/browse/HBASE-8974
>             Project: HBase
>          Issue Type: Bug
>          Components: scripts
>            Reporter: Nick Dimiduk
>
> I'm exercising the patch over on HBASE-8803 and I've noticed something in the logs: it
looks like {{rolling-restart.sh}} is restarting all the region servers multiple times instead
of just the current entry in the loop iteration.
> The logic looks like this:
> {noformat}
> for each rs in active region server list:
>   unload $rs // move all regions to other RS's
>   restart all Region Servers // !?! bug?
>   reload $rs // pile 'em back on
> {noformat}
> Shouldn't that step 2 be only {{restart $rs}}?
> This is what I see in the logs. My cluster has 9 active RegionServers. Notice the bit
in the middle where all 9 are stopped and started again after unloading the target RS.
> {noformat}
> $ time /usr/lib/hbase/bin/rolling-restart.sh --rs-only --graceful --maxthreads 30   
                                                                                         
         
> Gracefully restarting: hor18n39.gq1.ygridcore.net
> Disabling balancer!
> ...
> Unloading hor18n39.gq1.ygridcore.net region(s)
> ...
> Valid region move targets: 
> hor18n37.gq1.ygridcore.net,60020,1374094975268
> hor17n37.gq1.ygridcore.net,60020,1374094975264
> hor18n35.gq1.ygridcore.net,60020,1374094975327
> hor17n39.gq1.ygridcore.net,60020,1374094975281
> hor18n36.gq1.ygridcore.net,60020,1374094975254
> hor17n36.gq1.ygridcore.net,60020,1374094975277
> hor17n34.gq1.ygridcore.net,60020,1374094975291
> hor18n38.gq1.ygridcore.net,60020,1374094975259
> 13/07/17 21:44:38 INFO region_mover: Moving 330 region(s) from hor18n39.gq1.ygridcore.net,60020,1374094975326
during this cycle
> 13/07/17 21:44:38 INFO region_mover: Moving region b59050cf97aabcef838e3c50e93e6d13 (1
of 330) to server=hor18n37.gq1.ygridcore.net,60020,1374094975268
> ...
> 13/07/17 21:54:20 INFO region_mover: Moving region d00026d7cc396bb3e6ea91106cc6ab55 (329
of 330) to server=hor18n37.gq1.ygridcore.net,60020,1374094975268
> 13/07/17 21:54:20 INFO region_mover: Moving region a722179b33e6ece8c9cee3fba3056acd (330
of 330) to server=hor17n37.gq1.ygridcore.net,60020,1374094975264
> 13/07/17 21:54:21 INFO region_mover: Wrote list of moved regions to /tmp/hor18n39.gq1.ygridcore.net
> Unloaded hor18n39.gq1.ygridcore.net region(s)
> hor18n35.gq1.ygridcore.net: stopping regionserver.
> hor17n39.gq1.ygridcore.net: stopping regionserver.
> hor18n36.gq1.ygridcore.net: stopping regionserver.
> hor17n37.gq1.ygridcore.net: stopping regionserver.
> hor17n34.gq1.ygridcore.net: stopping regionserver.
> hor18n38.gq1.ygridcore.net: stopping regionserver.
> hor18n37.gq1.ygridcore.net: stopping regionserver.
> hor17n36.gq1.ygridcore.net: stopping regionserver.
> hor18n39.gq1.ygridcore.net: stopping regionserver.
> hor18n36.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n36.gq1.ygridcore.net.out
> hor17n36.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n36.gq1.ygridcore.net.out
> hor17n37.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n37.gq1.ygridcore.net.out
> hor18n37.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n37.gq1.ygridcore.net.out
> hor18n38.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n38.gq1.ygridcore.net.out
> hor17n34.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n34.gq1.ygridcore.net.out
> hor18n35.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n35.gq1.ygridcore.net.out
> hor18n39.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n39.gq1.ygridcore.net.out
> hor17n39.gq1.ygridcore.net: starting regionserver, logging to /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n39.gq1.ygridcore.net.out
> Reloading hor18n39.gq1.ygridcore.net region(s)
> ...
> 13/07/17 21:54:27 INFO region_mover: Moving 330 regions to hor18n39.gq1.ygridcore.net,60020,1374098064602
> 13/07/17 21:56:47 INFO region_mover: Moving region 7d0a02f452c334a12026b45346a87d36 (1
of 330) to server=hor18n39.gq1.ygridcore.net,60020,1374098064602 in thread 0
> 13/07/17 21:56:54 INFO region_mover: Moving region af5448c90e78a8f0d935efb0b380502e (2
of 330) to server=hor18n39.gq1.ygridcore.net,60020,1374098064602 in thread 1
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message