hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Assigning regions after restart
Date Thu, 16 Mar 2017 13:01:21 GMT
JIRA creation is done: https://issues.apache.org/jira/browse/HBASE-17791

I will have a look into it over the next few days, maybe I can come up
with a patch.

On Wed, Mar 15, 2017 at 6:14 AM, Stack <stack@duboce.net> wrote:
> File a blocker please Lars. I'm pretty sure the boolean on whether we are
> doing a recovery or not has been there a long time so yeah, a single server
> recovery could throw us off, but you make a practical point, that one
> server should not destroy locality over the cluster.
>
> St.Ack
>
> On Tue, Mar 14, 2017 at 7:09 AM, Lars George <lars.george@gmail.com> wrote:
>
>> Wait, HBASE-15251 is not enough methinks. The checks added help, but
>> are not covering all the possible edge cases. In particular, say a
>> node really fails, why not just reassign the few regions it did hold
>> and leave all the others where they are? Seems insane as it is.
>>
>> On Tue, Mar 14, 2017 at 2:24 PM, Lars George <lars.george@gmail.com>
>> wrote:
>> > Doh, https://issues.apache.org/jira/browse/HBASE-15251 addresses this
>> > (though I am not sure exactly how, see below). This should be
>> > backported to all 1.x branches!
>> >
>> > As for the patch, I see this
>> >
>> >      if (!failover) {
>> >        // Fresh cluster startup.
>> > -      LOG.info("Clean cluster startup. Assigning user regions");
>> > +      LOG.info("Clean cluster startup. Don't reassign user regions");
>> >        assignAllUserRegions(allRegions);
>> > +    } else {
>> > +      LOG.info("Failover! Reassign user regions");
>> >      }
>> >
>> > Would be interesting to see how that log message is actually
>> > reassigning the user regions. I would have assumed the
>> > "assignAllUserRegions()" would have to be called.
>> >
>> >
>> > On Tue, Mar 14, 2017 at 2:21 PM, Lars George <lars.george@gmail.com>
>> wrote:
>> >> Looking at the code more... it seems the issue is here
>> >>
>> >> In AssignmentManager.processDeadServersAndRegionsInTransition():
>> >>
>> >> ...
>> >> failoverCleanupDone();
>> >> if (!failover) {
>> >>   // Fresh cluster startup.
>> >>   LOG.info("Clean cluster startup. Assigning user regions");
>> >>   assignAllUserRegions(allRegions);
>> >> }
>> >> ...
>> >>
>> >> As soon as a single node failed, failover is set to true. So no
>> >> assignment is done. Later on the servers are recovered from the
>> >> "crash" (which a clean restart is as well), and then all unassigned
>> >> regions (all of them now, since they are unassigned earlier) are
>> >> assigned round-robin.
>> >>
>> >>
>> >>
>> >> On Tue, Mar 14, 2017 at 1:54 PM, Lars George <lars.george@gmail.com>
>> wrote:
>> >>> Hi,
>> >>>
>> >>> I had this happened at multiple clusters recently where after the
>> >>> restart the locality dropped from close to or exactly 100% down to
>> >>> single digits. The reason is that all regions were completely shuffled
>> >>> and reassigned to random servers. Upon reading the (yet again
>> >>> non-trivial) assignment code, I found that a single server missing
>> >>> will trigger a full "recovery" of all servers, which includes a drop
>> >>> of all previous assignments and random new assignment.
>> >>>
>> >>> This is just terrible! In addition, I also assumed that - at least the
>> >>> StochasticLoadBalancer - is checking which node holds most of the data
>> >>> of a region locality wise and picks that server. But that is _not_ the
>> >>> case! It just spreads everything seemingly randomly across the
>> >>> servers.
>> >>>
>> >>> To me this is a big regression (or straight bug) given that a single
>> >>> server out of, for example, hundreds could trigger that and destroy
>> >>> the locality completely. Running a major compaction is not an approach
>> >>> for many reasons.
>> >>>
>> >>> This used to work better, why that regression?
>> >>>
>> >>> Lars
>>

Mime
View raw message