hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From john guthrie <graf...@gmail.com>
Subject Re: Why hbase need manual split?
Date Wed, 06 Aug 2014 10:35:20 GMT
to be honest, we were doing manual splits for the main reason that we
wanted to make sure it was done on our schedule.

but it also occurred to me that the automatic splits, at least by default,
split the region in half. normally the idea is that both new halves
continue to grow, but with a sequentially increasing key that won't be
true. so if you're splitting in half you want your region split size to be
twice your desired region size so that when a split does occur the "older"
half of the region is the size you want it. manual splitting lets you split
at the end

hope this helps, and hope i'm not wrong,
john



On Wed, Aug 6, 2014 at 6:25 AM, Liu, Ming (HPIT-GADSC) <ming.liu2@hp.com>
wrote:

> Thanks Arun, and John,
>
> Both of your scenarios make a lot of sense to me. But for the
> "sequence-based key" case, I am still confused. It is like an append-only
> operation, so new data are always written into the same region, but that
> region will eventually reach the hbase.hregion.max.filesize and be
> automatically split, why still need a manual split? If we set the
> hbase.hregion.max.filesize to a "not too big" value, then a region will
> never grow too big?
>
> And I think I need to first understand how HBase do the auto split
> internally ( I am very new to HBase). Given a region with start key A, and
> end key B. When split, how HBase do split internally? Split in the middle
> of key range?
> Original region is in range [A,B], so split to [A, B-A/2] and [B-A/2+1, B]
> ?
> Then if most of the row key are in a small range [A, C], while C is very
> close to B-A/2, then I can see a problem of auto split.
>
> Is this true? Can HBase do split in other ways?
>
> Thanks,
> Ming
>
> -----Original Message-----
> From: john guthrie [mailto:grafpoo@gmail.com]
> Sent: Wednesday, August 06, 2014 6:01 PM
> To: user@hbase.apache.org
> Subject: Re: Why hbase need manual split?
>
> i had a customer with a sequence-based key (yes, he knew all the downsides
> for that). being able to split manually meant he could split a region that
> got too big at the end vice right down the middle. with a sequentially
> increasing key, splitting the region in half left one region half the
> desired size and likely to never be added to
>
>
> On Wed, Aug 6, 2014 at 2:44 AM, Arun Allamsetty <arun.allamsetty@gmail.com
> >
> wrote:
>
> > Hi Ming,
> >
> > The reason why we have it is because the user can decide where each
> > key goes. I can think multiple scenarios off the top of my head where
> > it would be useful and others can correct me if I am wrong.
> >
> > 1. Cases where you cannot have row keys which are equally lexically
> > distributed, leading in unequal loads on the regions. In such cases,
> > we can set key ranges to be assigned to different regions so that we
> > can have a more equal distribution.
> >
> > 2. The second scenario I am thinking of may be wrong and if it is,
> > it'll clear my misconceptions. In case you cannot denormalize your
> > data and you have to perform joins on certain range of row keys which
> > are lexically similar. So we split them and they would be assigned to
> > the same region server (right?) and the join would be performed locally.
> >
> > Cheers,
> > Arun
> >
> > Sent from a mobile device. Please don't mind the typos.
> > On Aug 6, 2014 12:30 AM, "Liu, Ming (HPIT-GADSC)" <ming.liu2@hp.com>
> > wrote:
> >
> > > Hi, all,
> > >
> > > As I understand, HBase will automatically split a region when the
> > > region is too big.
> > > So in what scenario, user needs to do a manual split? Could someone
> > kindly
> > > give me some examples that user need to do the region split
> > > explicitly
> > via
> > > HBase Shell or Java API?
> > >
> > > Thanks very much.
> > >
> > > Regards,
> > > Ming
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message