hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Assigning regions after restart
Date Tue, 14 Mar 2017 13:21:32 GMT
Looking at the code more... it seems the issue is here

In AssignmentManager.processDeadServersAndRegionsInTransition():

if (!failover) {
  // Fresh cluster startup.
  LOG.info("Clean cluster startup. Assigning user regions");

As soon as a single node failed, failover is set to true. So no
assignment is done. Later on the servers are recovered from the
"crash" (which a clean restart is as well), and then all unassigned
regions (all of them now, since they are unassigned earlier) are
assigned round-robin.

On Tue, Mar 14, 2017 at 1:54 PM, Lars George <lars.george@gmail.com> wrote:
> Hi,
> I had this happened at multiple clusters recently where after the
> restart the locality dropped from close to or exactly 100% down to
> single digits. The reason is that all regions were completely shuffled
> and reassigned to random servers. Upon reading the (yet again
> non-trivial) assignment code, I found that a single server missing
> will trigger a full "recovery" of all servers, which includes a drop
> of all previous assignments and random new assignment.
> This is just terrible! In addition, I also assumed that - at least the
> StochasticLoadBalancer - is checking which node holds most of the data
> of a region locality wise and picks that server. But that is _not_ the
> case! It just spreads everything seemingly randomly across the
> servers.
> To me this is a big regression (or straight bug) given that a single
> server out of, for example, hundreds could trigger that and destroy
> the locality completely. Running a major compaction is not an approach
> for many reasons.
> This used to work better, why that regression?
> Lars

View raw message