hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Balashov <ibalas...@gmail.com>
Subject Re: Fixing badly distributed table manually.
Date Mon, 24 Dec 2012 16:27:27 GMT

Vincent Barat <vbarat@...> writes:

> Hi,
> Balancing regions between RS is correctly handled by HBase : I mean 
> that your RSs always manage the same number of regions (the balancer 
> takes care of it).
> Unfortunately, balancing all the regions of one particular table 
> between the RS of your cluster is not always easy, since HBase (as 
> for 0.90.3) when it comes to splitting a region, create the new one 
> always on the same RS. This means that if you start with a 1 region 
> only table, and then you insert lots of data into it, new regions 
> will always be created to the same RS (if you insert is a M/R job, 
> you saturate this RS). Eventually, the balancer at a time will 
> decide to balance one of these regions to other RS, limiting the 
> issue, but it is not controllable.
> Here at Capptain, we solved this problem by developing a special 
> Python script, based on the HBase shell, allowing to entirely 
> balance all the regions of all tables to all RS. It ensure that 
> regions of tables are uniformly deployed on all RS of the cluster, 
> with a minimum region transitions.
> It is fast, and even if it can trigger a lot of region transitions, 
> there is very few impact at runtime and it can be run safely.
> If you are interested, just let me know, I can share it.
> Regards,


I would much like to see and possibly use the script that you 
mentioned. We've just run  into the same issue (after the table 
has been truncated it was re-created with only 1 region, and 
after data loading and manual splits we ended up having all 
regions within the same RS).

If you could share the script, it will be really appreciated, 
I believe not only by me.


View raw message