hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Ming (HPIT-GADSC)" <ming.l...@hp.com>
Subject RE: Why hbase need manual split?
Date Wed, 06 Aug 2014 10:25:13 GMT
Thanks Arun, and John,

Both of your scenarios make a lot of sense to me. But for the "sequence-based key" case, I
am still confused. It is like an append-only operation, so new data are always written into
the same region, but that region will eventually reach the hbase.hregion.max.filesize and
be automatically split, why still need a manual split? If we set the hbase.hregion.max.filesize
to a "not too big" value, then a region will never grow too big?   

And I think I need to first understand how HBase do the auto split internally ( I am very
new to HBase). Given a region with start key A, and end key B. When split, how HBase do split
internally? Split in the middle of key range?
Original region is in range [A,B], so split to [A, B-A/2] and [B-A/2+1, B] ?
Then if most of the row key are in a small range [A, C], while C is very close to B-A/2, then
I can see a problem of auto split. 

Is this true? Can HBase do split in other ways?


-----Original Message-----
From: john guthrie [mailto:grafpoo@gmail.com] 
Sent: Wednesday, August 06, 2014 6:01 PM
To: user@hbase.apache.org
Subject: Re: Why hbase need manual split?

i had a customer with a sequence-based key (yes, he knew all the downsides for that). being
able to split manually meant he could split a region that got too big at the end vice right
down the middle. with a sequentially increasing key, splitting the region in half left one
region half the desired size and likely to never be added to

On Wed, Aug 6, 2014 at 2:44 AM, Arun Allamsetty <arun.allamsetty@gmail.com>

> Hi Ming,
> The reason why we have it is because the user can decide where each 
> key goes. I can think multiple scenarios off the top of my head where 
> it would be useful and others can correct me if I am wrong.
> 1. Cases where you cannot have row keys which are equally lexically 
> distributed, leading in unequal loads on the regions. In such cases, 
> we can set key ranges to be assigned to different regions so that we 
> can have a more equal distribution.
> 2. The second scenario I am thinking of may be wrong and if it is, 
> it'll clear my misconceptions. In case you cannot denormalize your 
> data and you have to perform joins on certain range of row keys which 
> are lexically similar. So we split them and they would be assigned to 
> the same region server (right?) and the join would be performed locally.
> Cheers,
> Arun
> Sent from a mobile device. Please don't mind the typos.
> On Aug 6, 2014 12:30 AM, "Liu, Ming (HPIT-GADSC)" <ming.liu2@hp.com>
> wrote:
> > Hi, all,
> >
> > As I understand, HBase will automatically split a region when the 
> > region is too big.
> > So in what scenario, user needs to do a manual split? Could someone
> kindly
> > give me some examples that user need to do the region split 
> > explicitly
> via
> > HBase Shell or Java API?
> >
> > Thanks very much.
> >
> > Regards,
> > Ming
> >
View raw message