hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10632) Region lost in limbo after ArrayIndexOutOfBoundsException during assignment
Date Tue, 04 Mar 2014 01:10:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918868#comment-13918868
] 

Hudson commented on HBASE-10632:
--------------------------------

FAILURE: Integrated in HBase-0.98 #196 (See [https://builds.apache.org/job/HBase-0.98/196/])
HBASE-10632 Region lost in limbo after ArrayIndexOutOfBoundsException during assignment (enis:
rev 1573725)
* /hbase/branches/0.98
* /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java


> Region lost in limbo after ArrayIndexOutOfBoundsException during assignment
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-10632
>                 URL: https://issues.apache.org/jira/browse/HBASE-10632
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: hbase-10070
>            Reporter: Nick Dimiduk
>            Assignee: Enis Soztutar
>             Fix For: 0.96.2, 0.98.1, 0.99.0, hbase-10070
>
>         Attachments: hbase-10632_v1.patch
>
>
> Discovered while running IntegrationTestBigLinkedList. Region 24d68aa7239824e42390a77b7212fcbf
is scheduled for move from hor13n19 to hor13n13. During the process an exception is thrown.
> {noformat}
> 2014-02-25 15:30:42,613 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] master.RegionStates:
Transitioning {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
will be handled by SSH for hor13n19.gq1.ygridcore.net,60020,1393341563552
> 2014-02-25 15:30:42,613 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] handler.ServerShutdownHandler:
Reassigning 7 region(s) that hor13n19.gq1.ygridcore.net,60020,1393341563552 was carrying (and
0 regions(s) that were opening on this server)
> 2014-02-25 15:30:42,613 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] handler.ServerShutdownHandler:
Reassigning region with rs = {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552} and deleting zk node if exists
> 2014-02-25 15:30:42,623 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] master.RegionStates:
Transitioned {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
to {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
> 2014-02-25 15:30:42,623 DEBUG [AM.ZK.Worker-pool2-t46] master.AssignmentManager: Znode
IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
deleted, state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
> ...
> 2014-02-25 15:30:43,993 ERROR [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] executor.EventHandler:
Caught throwable while processing event M_SERVER_SHUTDOWN
> java.lang.ArrayIndexOutOfBoundsException: 0
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:250)
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:921)
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.roundRobinAssignment(BaseLoadBalancer.java:860)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2482)
> 	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:282)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:722)
> {noformat}
> After that, region is left in limbo and is never reassigned.
> {noformat}
> 2014-02-25 15:35:11,581 INFO  [FifoRpcScheduler.handler1-thread-6] master.HMaster: Client=hrt_qa//68.142.246.29
move hri=IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.,
src=hor13n19.gq1.ygridcore.net,60020,1393341563552, dest=hor13n13.gq1.ygridcore.net,60020,1393342222275,
running balancer
> 2014-02-25 15:35:11,581 INFO  [FifoRpcScheduler.handler1-thread-6] master.AssignmentManager:
Ignored moving region not assigned: {ENCODED => 24d68aa7239824e42390a77b7212fcbf, NAME
=> 'IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.',
STARTKEY => '\x80\x06\x1A', ENDKEY => ''}, {24d68aa7239824e42390a77b7212fcbf state=OFFLINE,
ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
> ...
> 2014-02-25 15:35:26,586 DEBUG [hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore]
master.HMaster: Not running balancer because 1 region(s) in transition: {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf
state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}}
> ...
> 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.HMaster: Client=hrt_qa//68.142.246.29
unassign IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
in current location if it is online and reassign.force=false
> 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.AssignmentManager:
Starting unassign of IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
(offlining), current state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
> 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.AssignmentManager:
Attempting to unassign IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
but it is already in transition (OFFLINE, force=false)
> ...
> 2014-02-25 15:40:26,587 DEBUG [hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore]
master.HMaster: Not running balancer because 1 region(s) in transition: {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf
state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}}
> {noformat}
> Spoke with [~enis] about it earlier, assigning to him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message