hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: DFS rebalancing with running HBase
Date Mon, 24 Jan 2011 13:42:09 GMT

The trouble was due to a defect in how HDFS managed partitioning deletion work among the datanodes.
Especially when under high write load, HBase can post a lot of deletes due to compactions.
Running the balancer just makes it worse -- additional replications into the face of uneven
deletion just brings the end faster when a datanode fills. 

This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630

This is fixed in HDFS 0.21 + via HADOOP-5124: https://issues.apache.org/jira/browse/HADOOP-5124


It might be a good idea to apply one of these fixes to the ASF 0.20-append branch.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)

--- On Mon, 1/24/11, Martin Fiala <fialama@gmail.com> wrote:

> From: Martin Fiala <fialama@gmail.com>
> Subject: DFS rebalancing with running HBase
> To: user@hbase.apache.org
> Date: Monday, January 24, 2011, 4:21 AM
> Hello,
> in one old thread regarding hadoop/hbase 0.19.x Andrew
> Purtell wrote, that running DFS balancer while HBase is
> running, is not recommended. I didn't find any remarks about
> this in Hadoop or HBase documentation.
> http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e
> Is it still the case? What bad things can happen?
> It is quite clear, that with writing heavily to HBase and
> running balancer simultaneously, the cluster is not going to
> be balanced. It can become even more unbalanced.
> What about running balancer when we are only reading from
> HBase or writing small amounts of records?
> Regards,
> Martin Fiala


View raw message