hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Meagher <john.meag...@gmail.com>
Subject Re: re-replication after data node failure
Date Wed, 26 Mar 2014 18:44:58 GMT
The balancer is not what handles adding extra replicas in the case of
a node failure, but it looks like the balancer bandwidth setting is
the way to throttle.  See:
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201301.mbox/%3C50F870C1.5010208@getjar.com%3E

On Wed, Mar 26, 2014 at 10:51 AM, feedly team <feedlydev@gmail.com> wrote:
> We recently had a node die in our hbase cluster. Afterwards, we saw a huge
> increase in traffic and I/O as hdfs re-replicated data from the dead node.
> This negatively affected our application and we are trying to see if there
> is a way to slow down this process so the app can still run (if a bit
> slower).
>
> Is the balancer job responsible for re-replication? This was our first
> thought but the docs mostly mention balancing disk utilization rather than
> restoring the replication factor, so we aren't sure if it's responsible or
> if it's some other process.
>
> If it is indeed the balancer, we saw there is a dfs.balance.bandwidthPerSec
> setting that we could change. The default is 1MB, does this mean that each
> node sends and receives at most 1MB/sec during balancing? We saw much, much
> higher sustained traffic than this. The levels we saw would be roughly
> correct if this is the in + out limit per data node pair. I.e. if you have a
> 5 node cluster, node1 would be limited to 1MB to each of the other 4 nodes,
> meaning the node would experience 4MB/s of traffic.

Mime
View raw message