hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Connor, Greg" <gcon...@createspace.com>
Subject Replication/recovery is slow--can threads be tuned?
Date Fri, 12 Mar 2010 19:02:09 GMT
I'm testing some scenarios where a very small cluster needs to replicate a lot of data due
to a node going down.  I'm observing pretty slow performance, and it seems like each node
is sending one or two blocks at a time, with lots of downtime where the network interface
is idle for 1-2 seconds before starting another block copy.

I've looked around for how to tune this, and setting these doesn't seem to help increase threads
or throughput.

  dfs.replication.interval  (tried 3,1)

  dfs.namenode.handler.count  (tried 10,30)

  dfs.datanode.handler.count  (tried 25)

  dfs.replication.considerLoad  (true or false)

Nothing seems to change the behavior.  It looks like each node with data to send opens connections
to each other node capable of receiving.  This is probably fine for normal-sized clusters,
but when there are 3 nodes and one goes down, the effect is that the two remaining nodes will
transfer to each other at about half the rate they are capable of.

If there's something I'm missing, please let me know.  If not, and this behavior is hard-coded
in, could I resolve this by running more instances of "datanode" on each client?  Has anyone
ever done this successfully?



View raw message