hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Assigning regions after restart
Date Tue, 14 Mar 2017 14:09:54 GMT
Wait, HBASE-15251 is not enough methinks. The checks added help, but
are not covering all the possible edge cases. In particular, say a
node really fails, why not just reassign the few regions it did hold
and leave all the others where they are? Seems insane as it is.

On Tue, Mar 14, 2017 at 2:24 PM, Lars George <lars.george@gmail.com> wrote:
> Doh, https://issues.apache.org/jira/browse/HBASE-15251 addresses this
> (though I am not sure exactly how, see below). This should be
> backported to all 1.x branches!
>
> As for the patch, I see this
>
>      if (!failover) {
>        // Fresh cluster startup.
> -      LOG.info("Clean cluster startup. Assigning user regions");
> +      LOG.info("Clean cluster startup. Don't reassign user regions");
>        assignAllUserRegions(allRegions);
> +    } else {
> +      LOG.info("Failover! Reassign user regions");
>      }
>
> Would be interesting to see how that log message is actually
> reassigning the user regions. I would have assumed the
> "assignAllUserRegions()" would have to be called.
>
>
> On Tue, Mar 14, 2017 at 2:21 PM, Lars George <lars.george@gmail.com> wrote:
>> Looking at the code more... it seems the issue is here
>>
>> In AssignmentManager.processDeadServersAndRegionsInTransition():
>>
>> ...
>> failoverCleanupDone();
>> if (!failover) {
>>   // Fresh cluster startup.
>>   LOG.info("Clean cluster startup. Assigning user regions");
>>   assignAllUserRegions(allRegions);
>> }
>> ...
>>
>> As soon as a single node failed, failover is set to true. So no
>> assignment is done. Later on the servers are recovered from the
>> "crash" (which a clean restart is as well), and then all unassigned
>> regions (all of them now, since they are unassigned earlier) are
>> assigned round-robin.
>>
>>
>>
>> On Tue, Mar 14, 2017 at 1:54 PM, Lars George <lars.george@gmail.com> wrote:
>>> Hi,
>>>
>>> I had this happened at multiple clusters recently where after the
>>> restart the locality dropped from close to or exactly 100% down to
>>> single digits. The reason is that all regions were completely shuffled
>>> and reassigned to random servers. Upon reading the (yet again
>>> non-trivial) assignment code, I found that a single server missing
>>> will trigger a full "recovery" of all servers, which includes a drop
>>> of all previous assignments and random new assignment.
>>>
>>> This is just terrible! In addition, I also assumed that - at least the
>>> StochasticLoadBalancer - is checking which node holds most of the data
>>> of a region locality wise and picks that server. But that is _not_ the
>>> case! It just spreads everything seemingly randomly across the
>>> servers.
>>>
>>> To me this is a big regression (or straight bug) given that a single
>>> server out of, for example, hundreds could trigger that and destroy
>>> the locality completely. Running a major compaction is not an approach
>>> for many reasons.
>>>
>>> This used to work better, why that regression?
>>>
>>> Lars

Mime
View raw message