hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Balancer Cutoff
Date Wed, 09 Nov 2016 10:46:59 GMT
Wait, I am getting more confused (why must this be so hard to read?),
as in HMaster.balance() we have this

long cutoffTime = System.currentTimeMillis() + maximumBalanceTime;
int rpCount = 0;  // number of RegionPlans balanced so far
long totalRegPlanExecTime = 0;
if (plans != null && !plans.isEmpty()) {
  for (RegionPlan plan: plans) {
    LOG.info("balance " + plan);
    long balStartTime = System.currentTimeMillis();
    //TODO: bulk assign
    totalRegPlanExecTime += System.currentTimeMillis()-balStartTime;
    if (rpCount < plans.size() &&
        // if performing next balance exceeds cutoff time, exit the loop
        (System.currentTimeMillis() + (totalRegPlanExecTime /
rpCount)) > cutoffTime) {
      //TODO: After balance, there should not be a cutoff time
(keeping it as a security net for now)
      LOG.debug("No more balancing till next balance run; maximumBalanceTime=" +

So the cut off time is T1 (assuming this is now when the balance()
call was made by the chore) plus the extra 5mins. rpCount is 0 and the
accrued time it took for plans to execute is also 0. Let's also assume
that we have many plans to execute, to trigger the checks.

We record the time the assignmentManager.balance(plan) call took, and
add this to the accrued time counter, while increasing the plan
counter. We loop now until we executed all plans _or_ (and that is the
part I am after), until the current time plus "the accrued plan
execution time divided by the their count" exceeds the cut off time
computed above.

For starters, why do we add the "rpCount < plans.size()" check at all
into that condition? What does that actually help or do? Just to skip
the very last check when the loop end and timeout coincide?

Next, what is that division doing in the time check? It computes the
average it seems (total time divided by the count) so say if 5 plans
ran, and each took 4 mins, then the time we check "Tcurr + ( 5 * 4
mins) / 5 > T1 + 5 mins", or "T1 + 20 mins + 4 mins > T1 + 5 mins". So
this already triggered way earlier and won't get to this. I am at a
loss to explain the logic behind this as I do not know what we usually
see in terms of number of plans and their execution times.

But even assuming say 4 plans at 1 min each, we get "T1 + 4 mins + 1
mins > T1 + 5 mins" - so is adding the average to skip the last one on
the assumption it would exceed the total balance time (the cut off)?
Seems as such.

It then runs say 4 mins, sleeps 1 min, and then starts again due to
the chore firing. In the end, we continuously rearrange the cluster,

On Wed, Nov 9, 2016 at 11:14 AM, Lars George <lars.george@gmail.com> wrote:
> Hi,
> In HMaster.java we have this code (added, mostly, in HBASE-3422):
> private int getBalancerCutoffTime() {
>   int balancerCutoffTime =
>     getConfiguration().getInt("hbase.balancer.max.balancing", -1);
>   if (balancerCutoffTime == -1) {
>     // No time period set so create one
>     int balancerPeriod =
>       getConfiguration().getInt("hbase.balancer.period", 300000);
>     balancerCutoffTime = balancerPeriod;
>     // If nonsense period, set it to balancerPeriod
>     if (balancerCutoffTime <= 0) balancerCutoffTime = balancerPeriod;
>   }
>   return balancerCutoffTime;
> }
> Looking at these lines in particular, it seems that HBASE-8119 has
> added the unconditional line, make the conditional obsolete:
>     balancerCutoffTime = balancerPeriod;
>     // If nonsense period, set it to balancerPeriod
>     if (balancerCutoffTime <= 0) balancerCutoffTime = balancerPeriod;
> Shall I just create a JIRA, patch and remove the conditional assignment?
> And since the cut off time is not set, it defaults to the same 5mins
> as the balancer period resolves to, making this basically a continuous
> process, as the chore is calling balance every 5 mins, and it then
> runs for 5 mins before stopping. How does this address the issue
> reported in HBASE-3422?
> Best,
> Lars

View raw message