hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: manual merge
Date Mon, 23 Mar 2015 14:46:46 GMT

I’m trying to understand your problem.

You pre-split your regions to help with some load balancing on the load.  Ok. 
So how did you calculate the number of regions to pre-split? 

You said that the number of regions has grown. How were the initial regions. Did you increase
the size of new regions?

Did you anticipate the growth or not consider the rate of growth? 
Is the table now relatively static or is it still growing? 
Is the table active or passive most of the time? 

If you are having to reduce the number of regions, do you have a window of opportunity to
take the table offline? 

Why not unload the table using a map/reduce program with a set number of reducers and then
load the data in to a temp table with the correct table configuration parameters then take
the first table offline, rename it, take the second (new) table and rename it as the first
and bring it online? 
(Then you have your initial table as a backup. ) 

This would require minimal downtime and you would have to do a diff of the tables to see what’s
in the original table that is not in the second table due to rows being added after unloaded
the table the first time. 

Of course there are variations on this, but you get the general idea. 



> On Mar 23, 2015, at 8:54 AM, Abe Weinograd <abe@flonet.com> wrote:
> Hello,
> We bulk load our table and during that process, pre-split regions to
> optimize load across servers.  The number of regions build up and we
> manually are merging them back.  Any merge of two regions is causing a
> compaction which slows down our merge process.
> We are merging two regions at a time and this it ends up being pretty
> slow.  In order to make it merge more regions in a shorter window of time,
> should we be merging more than one?  Can we do that?  The reason we are
> doing this is that our key is sequential.  In the short term, changing it
> is not an option. The merging helps keep the # of total regions down so
> that when we create 20 new regions for a load, the balancer will spread out
> the new regions across multiple region servers.
> We are currently on HBase 0.98.6 (CDH 5.3.0)
> Thanks,
> Abe

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

View raw message