hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heng Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15406) Split / merge switch left disabled after early termination of hbck
Date Thu, 14 Apr 2016 09:02:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240831#comment-15240831
] 

Heng Chen commented on HBASE-15406:
-----------------------------------

I test it on cluster with 3 RS,   hadoop version is 2.5.0

1. run {{bin/hbase hbck -abort -disableSplitAndMerge}} 
{code}
HBaseFsck command line options: -abort -disableSplitAndMerge
2016-04-14 16:48:16,307 INFO  [main] util.HBaseFsck: Launching hbck
2016-04-14 16:48:16,315 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Opening
socket connection to server 10.11.51.79/10.11.51.79:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-04-14 16:48:16,360 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x5b38c1ec
connecting to ZooKeeper ensemble=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
2016-04-14 16:48:16,360 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
sessionTimeout=90000 watcher=hconnection-0x5b38c1ec0x0, quorum=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181,
baseZNode=/hbase-test-cluster-15406
2016-04-14 16:48:16,361 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Socket
connection established to 10.11.51.79/10.11.51.79:2181, initiating session
2016-04-14 16:48:16,362 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Opening
socket connection to server 10.11.51.79/10.11.51.79:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-04-14 16:48:16,362 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Socket
connection established to 10.11.51.79/10.11.51.79:2181, initiating session
2016-04-14 16:48:16,368 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Session
establishment complete on server 10.11.51.79/10.11.51.79:2181, sessionid = 0x750c1c0af785fd7,
negotiated timeout = 40000
2016-04-14 16:48:16,368 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Session
establishment complete on server 10.11.51.79/10.11.51.79:2181, sessionid = 0x750c1c0af785fd8,
negotiated timeout = 40000
Version: 2.0.0-SNAPSHOT
Number of live region servers: 3
Number of dead region servers: 0
Master: dx-pipe-sata60-pm,16000,1460623655646
Number of backup masters: 0
Average load: 0.6666666666666666
Number of requests: 0
Number of regions: 2
Number of regions in transition: 0
2016-04-14 16:48:17,130 INFO  [main] util.HBaseFsck: Loading regionsinfo from the hbase:meta
table

Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2016-04-14 16:48:17,240 INFO  [main] util.HBaseFsck: getHTableDescriptors == tableNames =>
[]
2016-04-14 16:48:17,242 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3724af13
connecting to ZooKeeper ensemble=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
2016-04-14 16:48:17,242 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
sessionTimeout=90000 watcher=hconnection-0x3724af130x0, quorum=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181,
baseZNode=/hbase-test-cluster-15406
2016-04-14 16:48:17,245 INFO  [main-SendThread(10.11.51.78:2181)] zookeeper.ClientCnxn: Opening
socket connection to server 10.11.51.78/10.11.51.78:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-04-14 16:48:17,245 INFO  [main-SendThread(10.11.51.78:2181)] zookeeper.ClientCnxn: Socket
connection established to 10.11.51.78/10.11.51.78:2181, initiating session
2016-04-14 16:48:17,246 INFO  [main-SendThread(10.11.51.78:2181)] zookeeper.ClientCnxn: Session
establishment complete on server 10.11.51.78/10.11.51.78:2181, sessionid = 0x650c1c0cfd8b175,
negotiated timeout = 40000
2016-04-14 16:48:17,258 INFO  [main] client.ConnectionImplementation: Closing master protocol:
MasterService
2016-04-14 16:48:17,258 INFO  [main] client.ConnectionImplementation: Closing zookeeper sessionid=0x650c1c0cfd8b175
2016-04-14 16:48:17,259 INFO  [main] zookeeper.ZooKeeper: Session: 0x650c1c0cfd8b175 closed
Number of Tables: 0
2016-04-14 16:48:17,262 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
for session: 0x650c1c0cfd8b175
2016-04-14 16:48:17,340 INFO  [main] util.HBaseFsck: Loading region directories from HDFS

2016-04-14 16:48:17,449 INFO  [main] util.HBaseFsck: Loading region information from HDFS

2016-04-14 16:48:17,590 INFO  [main] util.HBaseFsck: Checking and fixing region consistency
2016-04-14 16:48:17,626 INFO  [main] util.HBaseFsck: Handling overlap merges in parallel.
set hbasefsck.overlap.merge.parallel to false to run serially.
2016-04-14 16:48:17,633 INFO  [main] util.HBaseFsck: Abort hbck!!!
2016-04-14 16:48:17,639 INFO  [Thread-4] zookeeper.ZooKeeper: Session: 0x750c1c0af785fd7 closed
2016-04-14 16:48:17,639 INFO  [Thread-4] client.ConnectionImplementation: Closing master protocol:
MasterService
2016-04-14 16:48:17,640 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
for session: 0x750c1c0af785fd7
2016-04-14 16:48:17,640 INFO  [Thread-4] client.ConnectionImplementation: Closing zookeeper
sessionid=0x750c1c0af785fd8
2016-04-14 16:48:17,642 INFO  [Thread-4] zookeeper.ZooKeeper: Session: 0x750c1c0af785fd8 closed
2016-04-14 16:48:17,643 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
for session: 0x750c1c0af785fd8
{code}

2. open the shell, try {{splitormerge_switch}} command
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14 16:38:25 CST
2016

hbase(main):001:0> splitormerge_switch 'SPLIT', true

ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: can't set splitOrMerge switch due to
lock
	at org.apache.hadoop.hbase.master.MasterRpcServices.setSplitOrMergeEnabled(MasterRpcServices.java:1501)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61521)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2250)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112)
	at java.lang.Thread.run(Thread.java:745)

Here is some help for this command:
Enable/Disable one switch. You can set switch type 'SPLIT' or 'MERGE'. Returns previous split
state.
Examples:

  hbase> splitormerge_switch 'SPLIT', true
  hbase> splitormerge_switch 'SPLIT', false
nil


hbase(main):002:0>
{code}
 
Try {{splitormerge_enabled 'SPLIT'}},   you will see
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14 16:38:25 CST
2016

hbase(main):001:0>  splitormerge_enabled 'SPLIT'
false
0 row(s) in 0.2940 seconds
{code}



3. Rerun {{bin/hbase hbck -disableSplitAndMerge}}
{code}
= 0x750c1c0af7866fa, negotiated timeout = 40000
Version: 2.0.0-SNAPSHOT
Number of live region servers: 3
Number of dead region servers: 0
Master: dx-pipe-sata60-pm,16000,1460623655646
Number of backup masters: 0
Average load: 0.6666666666666666
Number of requests: 0
Number of regions: 2
Number of regions in transition: 0
2016-04-14 16:53:44,459 INFO  [main] util.HBaseFsck: Loading regionsinfo from the hbase:meta
table

Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2016-04-14 16:53:44,559 INFO  [main] util.HBaseFsck: getHTableDescriptors == tableNames =>
[hbase:namespace]
2016-04-14 16:53:44,560 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3724af13
connecting to ZooKeeper ensemble=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
2016-04-14 16:53:44,560 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
sessionTimeout=90000 watcher=hconnection-0x3724af130x0, quorum=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181,
baseZNode=/hbase-test-cluster-15406
2016-04-14 16:53:44,562 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Opening
socket connection to server 10.11.51.79/10.11.51.79:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-04-14 16:53:44,563 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Socket
connection established to 10.11.51.79/10.11.51.79:2181, initiating session
2016-04-14 16:53:44,564 INFO  [main-SendThread(10.11.51.79:2181)] zookeeper.ClientCnxn: Session
establishment complete on server 10.11.51.79/10.11.51.79:2181, sessionid = 0x750c1c0af7866fb,
negotiated timeout = 40000
2016-04-14 16:53:44,578 INFO  [main] client.ConnectionImplementation: Closing master protocol:
MasterService
2016-04-14 16:53:44,579 INFO  [main] client.ConnectionImplementation: Closing zookeeper sessionid=0x750c1c0af7866fb
2016-04-14 16:53:44,580 INFO  [main] zookeeper.ZooKeeper: Session: 0x750c1c0af7866fb closed
Number of Tables: 1
2016-04-14 16:53:44,583 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
for session: 0x750c1c0af7866fb
2016-04-14 16:53:44,599 INFO  [main] util.HBaseFsck: Loading region directories from HDFS

2016-04-14 16:53:44,693 INFO  [main] util.HBaseFsck: Loading region information from HDFS

2016-04-14 16:53:44,879 INFO  [main] util.HBaseFsck: Checking and fixing region consistency
2016-04-14 16:53:44,914 INFO  [main] util.HBaseFsck: Handling overlap merges in parallel.
set hbasefsck.overlap.merge.parallel to false to run serially.
2016-04-14 16:53:44,929 INFO  [main] util.HBaseFsck: Computing mapping of all store files

2016-04-14 16:53:44,946 INFO  [main] util.HBaseFsck: Validating mapping using HDFS state
Summary:
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  dx-pipe-sata60-pm,16000,1460623655646
Table hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  dx-pipe-sata60-pm,16000,1460623655646
0 inconsistencies detected.
Status: OK
2016-04-14 16:53:44,992 INFO  [main] zookeeper.ZooKeeper: Session: 0x750c1c0af7866fa closed
2016-04-14 16:53:44,993 INFO  [main] client.ConnectionImplementation: Closing master protocol:
MasterService
2016-04-14 16:53:44,993 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
for session: 0x750c1c0af7866fa
2016-04-14 16:53:44,994 INFO  [main] client.ConnectionImplementation: Closing zookeeper sessionid=0x550c1c0af77096e
2016-04-14 16:53:44,996 INFO  [main] zookeeper.ZooKeeper: Session: 0x550c1c0af77096e closed
2016-04-14 16:53:44,996 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
for session: 0x550c1c0af77096e
{code}

4. try {{splitormerge_enabled}} command, you will see the switch set back
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14 16:38:25 CST
2016

hbase(main):001:0> splitormerge_enabled 'SPLIT'
true
0 row(s) in 0.3910 seconds

hbase(main):002:0>
{code}

try {{splitormerge_switch 'SPLIT', true}},   there is no lock any more.
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14 16:38:25 CST
2016

hbase(main):001:0> splitormerge_switch 'SPLIT', true
true
0 row(s) in 0.3060 seconds

hbase(main):002:0>
{code}


> Split / merge switch left disabled after early termination of hbck
> ------------------------------------------------------------------
>
>                 Key: HBASE-15406
>                 URL: https://issues.apache.org/jira/browse/HBASE-15406
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.4.0
>
>         Attachments: HBASE-15406.patch, HBASE-15406.v1.patch, HBASE-15406_v1.patch, HBASE-15406_v2.patch,
test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default value after
hbck exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message