hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: uneven regions size after region split.
Date Thu, 19 Dec 2013 18:22:21 GMT
Take a look at the following API in HBaseAdmin:

  public void createTable(final HTableDescriptor desc, byte [][] splitKeys)

Cheers


On Thu, Dec 19, 2013 at 10:17 AM, Kim Chew <kchew534@gmail.com> wrote:

> Thanks Ted,
>
> Is there a Java api for that? :)
>
> Kim
>
>
> On Thu, Dec 19, 2013 at 10:10 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Kim:
> > For specification of split keys at time of table creation, see Luke' s
> > comment on Dec 4th in this JIRA:
> > HBASE-4163
> >
> > Cheers
> >
> >
> > On Thu, Dec 19, 2013 at 10:05 AM, Kim Chew <kchew534@gmail.com> wrote:
> >
> > > Hello Jean-Marc,
> > >
> > > I ran into a similar situation and I was using Hannibal to check the
> > > regions status. My set up is hbase 0.94.8 and a three Region Servers
> > > cluster. The table is pre-splitted to three regions (Which matches the
> > > number of RS) My row key looks like this,
> > >
> > >          <bucket number><reversed timestamp><random number>
> > >
> > > The "bucket number" is the number of regions. After the the table is
> > > created, it looks like this,
> > >
> > > RS           start key           end key
> > > 0                                      001
> > > 1              001                   002
> > > 2              002
> > >
> > > After many region splits, I checked the regions status which is sorted
> by
> > > host, I could see from the graph that in each host, there is one single
> > > region that "stick out" i.e. has the biggest size. It is interesting to
> > > find out the start keys and end keys for these three regions are
> > >
> > > start key     end key
> > >                   000
> > > 001             001<the rest of the row key>
> > > 002             002<the rest of the row key>
> > >
> > > I am interested to find out why and how that happens. May be my row key
> > > does not make the writes evenly distributed as I thought it would?
> > > Also can I specified the start key and the end key when I pre-split the
> > > table? I am not aware there is such a way.
> > >
> > > Thanks,
> > >
> > > Kim
> > >
> > >
> > > On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Kim,
> > > >
> > > > The regions on the graph are order by size.
> > > >
> > > > When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean
> > the
> > > > next writes are going to be balanced between the 2 regions. so at
> some
> > > > point, one should reach again 10gb, and the other one maybe still
> onlye
> > > > 9gb. So you will have this time 9gb, 5gb, 5gb.
> > > >
> > > > And so on.
> > > >
> > > > Also, based on the size of the rows, the blocks, etc., HBase might
> not
> > be
> > > > able to split right in the middle of the region. So maybe you will
> get
> > > 6gb
> > > > and 4gb instead of 5 and 5.
> > > >
> > > > Now, add some deletes, some compactions, some manual splits, and you
> > will
> > > > end with a scenario like the one you sent.
> > > >
> > > > hth.
> > > >
> > > > JM
> > > >
> > > >
> > > > 2013/12/18 Kim Chew <kchew534@gmail.com>
> > > >
> > > > > Sorry if it may sounds like an open-end question, but I am
> wondering
> > > why
> > > > > this scenario happened after many region-splits,
> > > > >
> > > > > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits
> > > > >
> > > > > It seems to me that the writes are concentrated to the first two
> > > > > bars(Regions) after the splits.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Kim
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message