hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: uneven regions size after region split.
Date Thu, 19 Dec 2013 19:05:06 GMT
Hi Kim,

The way HBase split a region is based on the SplitPolicy.

In 0.94.15 you have:
ConstantSizeRegionSplitPolicy
IncreasingToUpperBoundRegionSplitPolicy
DelimitedKeyPrefixRegionSplitPolicy
KeyPrefixRegionSplitPolicy

You might want to take a look at them and see if one might be better for
your usecase.

Default is IncreasingToUpperBoundRegionSplitPolicy and this can be
configured using the hbase.regionserver.region.split.policy property.

JM


2013/12/19 Kim Chew <kchew534@gmail.com>

> Hello Jean-Marc,
>
> I ran into a similar situation and I was using Hannibal to check the
> regions status. My set up is hbase 0.94.8 and a three Region Servers
> cluster. The table is pre-splitted to three regions (Which matches the
> number of RS) My row key looks like this,
>
>          <bucket number><reversed timestamp><random number>
>
> The "bucket number" is the number of regions. After the the table is
> created, it looks like this,
>
> RS           start key           end key
> 0                                      001
> 1              001                   002
> 2              002
>
> After many region splits, I checked the regions status which is sorted by
> host, I could see from the graph that in each host, there is one single
> region that "stick out" i.e. has the biggest size. It is interesting to
> find out the start keys and end keys for these three regions are
>
> start key     end key
>                   000
> 001             001<the rest of the row key>
> 002             002<the rest of the row key>
>
> I am interested to find out why and how that happens. May be my row key
> does not make the writes evenly distributed as I thought it would?
> Also can I specified the start key and the end key when I pre-split the
> table? I am not aware there is such a way.
>
> Thanks,
>
> Kim
>
>
> On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Kim,
> >
> > The regions on the graph are order by size.
> >
> > When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean the
> > next writes are going to be balanced between the 2 regions. so at some
> > point, one should reach again 10gb, and the other one maybe still onlye
> > 9gb. So you will have this time 9gb, 5gb, 5gb.
> >
> > And so on.
> >
> > Also, based on the size of the rows, the blocks, etc., HBase might not be
> > able to split right in the middle of the region. So maybe you will get
> 6gb
> > and 4gb instead of 5 and 5.
> >
> > Now, add some deletes, some compactions, some manual splits, and you will
> > end with a scenario like the one you sent.
> >
> > hth.
> >
> > JM
> >
> >
> > 2013/12/18 Kim Chew <kchew534@gmail.com>
> >
> > > Sorry if it may sounds like an open-end question, but I am wondering
> why
> > > this scenario happened after many region-splits,
> > >
> > > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits
> > >
> > > It seems to me that the writes are concentrated to the first two
> > > bars(Regions) after the splits.
> > >
> > > Thanks.
> > >
> > > Kim
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message