hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim Chew <kchew...@gmail.com>
Subject Re: uneven regions size after region split.
Date Thu, 19 Dec 2013 18:05:56 GMT
Hello Jean-Marc,

I ran into a similar situation and I was using Hannibal to check the
regions status. My set up is hbase 0.94.8 and a three Region Servers
cluster. The table is pre-splitted to three regions (Which matches the
number of RS) My row key looks like this,

         <bucket number><reversed timestamp><random number>

The "bucket number" is the number of regions. After the the table is
created, it looks like this,

RS           start key           end key
0                                      001
1              001                   002
2              002

After many region splits, I checked the regions status which is sorted by
host, I could see from the graph that in each host, there is one single
region that "stick out" i.e. has the biggest size. It is interesting to
find out the start keys and end keys for these three regions are

start key     end key
                  000
001             001<the rest of the row key>
002             002<the rest of the row key>

I am interested to find out why and how that happens. May be my row key
does not make the writes evenly distributed as I thought it would?
Also can I specified the start key and the end key when I pre-split the
table? I am not aware there is such a way.

Thanks,

Kim


On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Kim,
>
> The regions on the graph are order by size.
>
> When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean the
> next writes are going to be balanced between the 2 regions. so at some
> point, one should reach again 10gb, and the other one maybe still onlye
> 9gb. So you will have this time 9gb, 5gb, 5gb.
>
> And so on.
>
> Also, based on the size of the rows, the blocks, etc., HBase might not be
> able to split right in the middle of the region. So maybe you will get 6gb
> and 4gb instead of 5 and 5.
>
> Now, add some deletes, some compactions, some manual splits, and you will
> end with a scenario like the one you sent.
>
> hth.
>
> JM
>
>
> 2013/12/18 Kim Chew <kchew534@gmail.com>
>
> > Sorry if it may sounds like an open-end question, but I am wondering why
> > this scenario happened after many region-splits,
> >
> > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits
> >
> > It seems to me that the writes are concentrated to the first two
> > bars(Regions) after the splits.
> >
> > Thanks.
> >
> > Kim
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message