hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
Date Thu, 11 May 2017 23:42:04 GMT
Stephen Yuan Jiang created HBASE-18036:
------------------------------------------

             Summary: Data locality is not maintained after cluster restart or SSH
                 Key: HBASE-18036
                 URL: https://issues.apache.org/jira/browse/HBASE-18036
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
    Affects Versions: 1.1.10, 1.2.5, 1.3.1, 1.4.0
            Reporter: Stephen Yuan Jiang
            Assignee: Stephen Yuan Jiang


After HBASE-2896 / HBASE-4402, we think data locality is maintained after cluster restart.
 However, we have seem some complains about data locality loss when cluster restart (eg. HBASE-17963).
 

Examining the AssignmentManager#processDeadServersAndRegionsInTransition() code,  for cluster
start, I expected to hit the following code path:
{code}
    if (!failover) {
      // Fresh cluster startup.
      LOG.info("Clean cluster startup. Assigning user regions");
      assignAllUserRegions(allRegions);
    }
{code}
where assignAllUserRegions would use retainAssignment() call in LoadBalancer; however, from
master log,  we usually hit the failover code path:
{code}
    // If we found user regions out on cluster, its a failover.
    if (failover) {
      LOG.info("Found regions out on cluster or in RIT; presuming failover");
      // Process list of dead servers and regions in RIT.
      // See HBASE-4580 for more information.
      processDeadServersAndRecoverLostRegions(deadServers);
    }
{code}
where processDeadServersAndRecoverLostRegions() would put dead servers in SSH and SSH uses
roundRobinAssignment() in LoadBalancer.  That is why we would see loss locality more often
than retaining locality during cluster restart.

Note: the code I was looking at is close to branch-1 and branch-1.1.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message