hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Is there any Namenode block scheduler which considers the block load
Date Tue, 11 Oct 2011 12:43:50 GMT

Moving this to hdfs-dev@.

The transfer thread load is definitely considered when building the DN
replication pipeline. See the method
ReplicationTargetChooser#isGoodTarget(…), which is called during each
type of choice (local node, local rack, remote rack or random). One
part of its analysis is looking at the load factor, which is
determined by the thread count, and enabled by default.


On Tue, Oct 11, 2011 at 4:43 PM, AnilKumar B <akumarb2010@gmail.com> wrote:
> Scenario: If I run huge number of jobs(all these jobs will use the same
> resources(input files)) on mini cluster(say 10-15 nodes), then every time
> namenode returning the first block of nearest data node. So in this case all
> the clients are trying to do read/write operations on same block.
> So is there any other namenode scheduling which considers the block level
> overhead?
> I am trying to form the pipeline based on datanode xceiver count, instead of
>  travelling salesman problem. Is this the correct idea?

Harsh J

View raw message