hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: HDFS load(traffic) balancing
Date Wed, 18 Feb 2009 00:54:35 GMT
Sangmin Lee wrote:
> Hi folks,
> I have a question regarding hdfs' load balancing when it chooses target
> datanodes for a block.
> From the code, it seems it make a decision based on the information from
> previously heartbeats.
> Since heartbeats come every 3 seconds, within that window we may end up
> putting more load on some datanodes than others.
> I noticed that for disk space balancing, namenode maintains scheduled block
> information for each datanode which is updated whenever new block is
> assigned to the datanodes.
> Shouldn't we do a similar thing for traffic??

we should. HADOOP-3707 was meant for a dot release and thus didn't want 
to depend on the new stat too much that time. The comments in jira and 
in the code mention so.

Unless you have a large heartbeat, do you really think it makes a much 
difference in normal case? We would like to know if you saw any such cases.

It could help if there are large number of clients simultaneously 
writing from small set of nodes.

Based on discussions here at Yahoo.. this area of NN scheduling will 
undergo some improvements in near future especially to handle nodes with 
heterogeneous datanodes.


> Thanks,
> Sangmin Lee

View raw message