hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cuijianwei (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-12829) Request count in RegionLoad may not accurate to compute the region load cost
Date Fri, 09 Jan 2015 11:20:34 GMT
cuijianwei created HBASE-12829:
----------------------------------

             Summary: Request count in RegionLoad may not accurate to compute the region load
cost
                 Key: HBASE-12829
                 URL: https://issues.apache.org/jira/browse/HBASE-12829
             Project: HBase
          Issue Type: Improvement
          Components: Balancer
    Affects Versions: 0.99.2
            Reporter: cuijianwei
            Priority: Minor


StochasticLoadBalancer#RequestCostFunction(ReadRequestCostFunction and WriteRequestCostFunction)
will compute load cost for a region based on a number of remembered region loads. Each region
load records the total count for read/write request at reported time since it opened. However,
the request count will be reset if region moved, making the new reported count could not represent
the total request. For example, if a region has high write throughput, the WrtieRequest in
region load will be very big after onlined for a long time, then if the region moved, the
new WriteRequest will be much smaller, making the region contributes much smaller to the cost
of its belonging rs. We may need to consider the region open time to get more accurate region
load. 
As another way, how about using read/write request count at each time slots instead of total
request count? The total count will make older read/write request throughput contribute more
to the cost by CostFromRegionLoadFunction#getRegionLoadCost:
{code}
    protected double getRegionLoadCost(Collection<RegionLoad> regionLoadList) {
      double cost = 0;

      for (RegionLoad rl : regionLoadList) {
        double toAdd = getCostFromRl(rl);

        if (cost == 0) {
          cost = toAdd;
        } else {
          cost = (.5 * cost) + (.5 * toAdd);
        }
      }
      return cost;
    }
{code}
For example, assume the balancer now remembers three loads for a region at time t1, t2, t3(t1
< t2 < t3), the write request is w1, w2, w3 respectively for time slots [0, t1), [t1,
t2), [t2, t3), so the WriteRequest in the region load at t1, t2, t3 will be w1, w1 + w2, w1
+ w2 + w3 and the WriteRequest cost will be:
{code}
    0.5 * (w1 + w2 + w3) + 0.25 * (w1 + w2)  + 0.25 * w1 = w1 + 0.75 * w2 + 0.5 * w3
{code}
The w1 contributes more to the cost than w2 and w3. However, intuitively, I think the recent
read/write throughput should represent the current load of the region better than the older
ones. Therefore, how about using w1, w2 and w3 directly when computing? Then, the cost will
become:
{code}
    0.25 * w1 + 0.25 * w2 + 0.5 * w3
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message