Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CABc1ogTnVCJBZJ9iXBvjP8qh_4P2HEQXv1Vz43sLvDo81OmpQg@mail.gmail.com>
References: 
 <CABc1ogTg_hvrqu0BxJiweB_H1HMqZu3Bqi92sUgW12S9PAtBvw@mail.gmail.com>
 <CALvzVW3x+DNitXjc30MX0vbJf5+O=tcvFZ3XS=qSa9Qpso_QAQ@mail.gmail.com>
 <CANZa=GusxaD0qw+f6Q4bk+RHtq0cZxFCxawkFARiV=nUJ+=rew@mail.gmail.com>
 <CAHxLZBXpvSUH5a1EtCTa11-RMTWr3tYaFz_o+BdO2Bz_X_MSQw@mail.gmail.com>
 <CANZDn9t5CO7gfm1GQqb7=7XsTzL3d8ODw2P7Y2gg9qBhJiJbow@mail.gmail.com>
 <CAEf6Z5J2o+J1wxN=NtjSn0Q7Y_hQLaWuSKQHxrF6jC5OBg_EXA@mail.gmail.com>
 <CABc1ogTnVCJBZJ9iXBvjP8qh_4P2HEQXv1Vz43sLvDo81OmpQg@mail.gmail.com>
From: Nick Dimiduk <ndimiduk@gmail.com>
Date: Fri, 19 Jun 2015 09:19:20 -0700
Message-ID: 
 <CANZa=Gt8TeKZvvDKKq48ScZ+d7Z=DCLuBpJf_808RizSJcCwKw@mail.gmail.com>
Subject: Re: Stochastic Balancer by tables
To: hbase-user <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=f46d04426c688e30ff0518e14a24

--f46d04426c688e30ff0518e14a24
Content-Type: text/plain; charset=UTF-8

On Fri, Jun 19, 2015 at 7:45 AM, Nasron Cheong <nasron@gmail.com> wrote:

> I couldn't find a tool to show regions and their sizes, for a specific
> table, so ended up writing one.
>

Nasron,

Would you mind having a look at the patch/RB on HBASE-13103? Does the API
pair RegionNormalizer/Normalization plan look like a reasonable harness for
you to hang your custom tool onto? Just like the balancer, it's designed to
be extensible with different normalization strategies.

On Fri, Jun 19, 2015 at 3:47 AM, Dejan Menges <dejan.menges@gmail.com>
> wrote:
>
> > Just have to say that hbase.master.loadbalance.bytable saved us after we
> > discovered it. In our case we had to set it manually to true, and then it
> > was easy to catch hot spotting on unusually large regions and handle it.
> >
> > Btw +1 for HBASE-13013, had to say it, something that makes me starting
> > upgrading our HDP stack on Monday morning.
> >
> > On Thu, Jun 18, 2015 at 11:04 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > Just had to say, https://issues.apache.org/jira/browse/HBASE-13103
> looks
> > > *AWESOME*
> > >
> > > On Thu, Jun 18, 2015 at 5:00 PM Mikhail Antonov <olorinbant@gmail.com>
> > > wrote:
> > >
> > > > Yeah, I could see 2 reasons for remaining few regions to take
> > > > unproportionally long time - 1) those regions are unproportionally
> > > > large (you should be able to quickly confirm it) and 2) they happened
> > > > to be hosted on really slow/overloaded machine(s). #1 seems far more
> > > > likely to me.
> > > >
> > > > And as Nick said, there's ongoing effort to provide exactly what
> > > > you've described - centralized periodic analysis of region sizes and
> > > > equalization as needed (somewhat complementary to balancing), and any
> > > > feedback (especially from folks experiencing real issues with unequal
> > > > region sizes) is much appreciated.
> > > >
> > > > -Mikhail
> > > >
> > > > On Thu, Jun 18, 2015 at 10:07 AM, Nick Dimiduk <ndimiduk@gmail.com>
> > > wrote:
> > > > > If you're interested in region size balancing, please have a look
> at
> > > > > https://issues.apache.org/jira/browse/HBASE-13103 . Please provide
> > > > feedback
> > > > > as we're hoping to have an early version available in 1.2.
> > > > >
> > > > > Which reminds me, I owe Mikhail another review...
> > > > >
> > > > > On Thu, Jun 18, 2015 at 9:39 AM, Elliott Clark <eclark@apache.org>
> > > > wrote:
> > > > >
> > > > >> The balancer is not responsible fore region size decisions. The
> > > > balancer is
> > > > >> only responsible for deciding which regionservers should host
> which
> > > > >> regions.
> > > > >> Splits are determined by data size of a region. See max store file
> > > size.
> > > > >>
> > > > >> On Thu, Jun 18, 2015 at 7:50 AM, Nasron Cheong <nasron@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > I've noticed there are two settings available when using the
> HBase
> > > > >> balancer
> > > > >> > (specifically the default stochastic balancer)
> > > > >> >
> > > > >> > hbase.master.balancer.stochastic.tableSkewCost
> > > > >> >
> > > > >> > hbase.master.loadbalance.bytable
> > > > >> >
> > > > >> > How do these two settings relate? The documentation indicates
> when
> > > > using
> > > > >> > the stochastic balancer that 'bytable' should be set to false?
> > > > >> >
> > > > >> > Our deployment relies on very few, very large tables, and I've
> > > noticed
> > > > >> bad
> > > > >> > distribution when accessing some of the tables. E.g. there are
> 443
> > > > >> regions
> > > > >> > for a single table, but when doing a MR job over a full scan of
> > the
> > > > >> table,
> > > > >> > the first 426 regions scan quickly (minutes), but the remaining
> 17
> > > > >> regions
> > > > >> > take significantly longer (hours)
> > > > >> >
> > > > >> > My expectation is to have the balancer equalize the size of the
> > > > regions
> > > > >> for
> > > > >> > each table.
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> > - Nasron
> > > > >> >
> > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Michael Antonov
> > > >
> > >
> >
>

--f46d04426c688e30ff0518e14a24--