hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14536) Balancer & SSH interfering with each other leading to unavailability
Date Fri, 16 Oct 2015 19:48:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961269#comment-14961269
] 

Stephen Yuan Jiang commented on HBASE-14536:
--------------------------------------------

[~enis] and I discussed the approach in the newest patch and agreed on the solution.  Is no
one has any objection?  I am going to commit in branch-1.x soon.

> Balancer & SSH interfering with each other leading to unavailability
> --------------------------------------------------------------------
>
>                 Key: HBASE-14536
>                 URL: https://issues.apache.org/jira/browse/HBASE-14536
>             Project: HBase
>          Issue Type: Bug
>          Components: master, Region Assignment
>    Affects Versions: 1.1.2
>            Reporter: Devaraj Das
>            Assignee: Stephen Yuan Jiang
>             Fix For: 1.1.4
>
>         Attachments: HBASE-14536.v1-branch-1.1.patch, HBASE-14536.v2-branch-1.1.patch,
HBASE-14536.v3-branch-1.1.patch, master-log.tgz
>
>
> Came across this in our cluster:
> 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340
> {noformat}
> 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] 
> master.RegionStates: Onlined 1588230740 on 
> 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => 
> 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
> {noformat}
> 2. The server dies at some point:
> {noformat}
> 2015-09-29 06:18:25,952 INFO  [main-EventThread] 
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, 
> processing expiration [10.0.0.149,16020,1443507203340]
> 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: based on AM,
current 
> region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 server being
checked: 
> 10.0.0.149,16020,1443507203340
> {noformat}
> 3. The balancer had computed a plan that contained a move for the meta:
> {noformat}
> 2015-09-29 06:18:26,833 INFO  [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster:

> balance hri=hbase:meta,,1.1588230740, 
> src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905
> {noformat}
> 4. The following ensues after this, leading to the meta remaining unassigned:
> {noformat}
> 2015-09-29 06:18:26,859 DEBUG [B.defaultRpcServer.handler=12,queue=0,port=16000] 
> master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to 
> unassign since it's on a dead server: 10.0.0.149,16020,1443507203340
> ......................
> 2015-09-29 06:18:26,899 INFO  [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates:

> Offlined 1588230740 from 10.0.0.149,16020,1443507203340
> .....................
> 2015-09-29 06:18:26,914 INFO  [B.defaultRpcServer.handler=12,queue=0,port=16000] 
> master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is 
> on a dead but not processed yet server: 10.0.0.149,16020,1443507203340
> ....................
> 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] master.AssignmentManager: Znode
hbase:meta,,1.1588230740 deleted, 
> state: {1588230740 state=OFFLINE, ts=1443507506914, 
> server=10.0.0.149,16020,1443507203340}
> ....................
> 2015-09-29 06:18:29,447 DEBUG [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager:
based on AM, current 
> region=hbase:meta,,1.1588230740 is on server=null server being checked: 
> 10.0.0.149,16020,1443507203340
> 2015-09-29 06:18:29,451 INFO  [MASTER_META_SERVER_OPERATIONS-
> 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been 
> assigned to otherwhere, skip assigning.
> 2015-09-29 06:18:29,452 DEBUG [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] 
> master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message