Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CABc1ogTg_hvrqu0BxJiweB_H1HMqZu3Bqi92sUgW12S9PAtBvw@mail.gmail.com>
References: 
 <CABc1ogTg_hvrqu0BxJiweB_H1HMqZu3Bqi92sUgW12S9PAtBvw@mail.gmail.com>
Date: Thu, 18 Jun 2015 09:39:44 -0700
Message-ID: 
 <CALvzVW3x+DNitXjc30MX0vbJf5+O=tcvFZ3XS=qSa9Qpso_QAQ@mail.gmail.com>
Subject: Re: Stochastic Balancer by tables
From: Elliott Clark <eclark@apache.org>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a11c256de79d5510518cd74e0

--001a11c256de79d5510518cd74e0
Content-Type: text/plain; charset=UTF-8

The balancer is not responsible fore region size decisions. The balancer is
only responsible for deciding which regionservers should host which regions.
Splits are determined by data size of a region. See max store file size.

On Thu, Jun 18, 2015 at 7:50 AM, Nasron Cheong <nasron@gmail.com> wrote:

> Hi,
>
> I've noticed there are two settings available when using the HBase balancer
> (specifically the default stochastic balancer)
>
> hbase.master.balancer.stochastic.tableSkewCost
>
> hbase.master.loadbalance.bytable
>
> How do these two settings relate? The documentation indicates when using
> the stochastic balancer that 'bytable' should be set to false?
>
> Our deployment relies on very few, very large tables, and I've noticed bad
> distribution when accessing some of the tables. E.g. there are 443 regions
> for a single table, but when doing a MR job over a full scan of the table,
> the first 426 regions scan quickly (minutes), but the remaining 17 regions
> take significantly longer (hours)
>
> My expectation is to have the balancer equalize the size of the regions for
> each table.
>
> Thanks!
>
> - Nasron
>

--001a11c256de79d5510518cd74e0--