hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
Date Tue, 20 Jun 2017 23:47:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056681#comment-16056681
] 

Stephen Yuan Jiang commented on HBASE-18036:
--------------------------------------------

[~enis], with Proc-V2 AM, the current change is no longer available.  Currently, with initial
commit of new AM, SSH calls AM.createAssignProcedures(), with forceNewPlan=true.  Even forceNewPlan
is false, when we compare existing plan's ServerName, it will not be equal to the dead server
due to timestamp change (ServerName is hostname+port+timestamp) & hence a new plan/server
would be used for the region assignment.  Hence, locality is not guaranteed to be retained.
 The potential change would be more involved than we have now in 1.x code base.  I open HBASE-18246
to track it (FYI, [~stack]).  

> Data locality is not maintained after cluster restart or SSH
> ------------------------------------------------------------
>
>                 Key: HBASE-18036
>                 URL: https://issues.apache.org/jira/browse/HBASE-18036
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 1.3.2, 1.1.11, 1.2.7
>
>         Attachments: HBASE-18036.v0-branch-1.1.patch, HBASE-18036.v0-branch-1.patch,
HBASE-18036.v1-branch-1.1.patch, HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after cluster restart.
 However, we have seem some complains about data locality loss when cluster restart (eg. HBASE-17963).
 
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() code,  for
cluster start, I expected to hit the following code path:
> {code}
>     if (!failover) {
>       // Fresh cluster startup.
>       LOG.info("Clean cluster startup. Assigning user regions");
>       assignAllUserRegions(allRegions);
>     }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; however,
from master log,  we usually hit the failover code path:
> {code}
>     // If we found user regions out on cluster, its a failover.
>     if (failover) {
>       LOG.info("Found regions out on cluster or in RIT; presuming failover");
>       // Process list of dead servers and regions in RIT.
>       // See HBASE-4580 for more information.
>       processDeadServersAndRecoverLostRegions(deadServers);
>     }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH and SSH
uses roundRobinAssignment() in LoadBalancer.  That is why we would see loss locality more
often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message