hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: Can you disable the rule forcing replication to go outside rack?
Date Tue, 29 Sep 2009 21:48:45 GMT
Stuart,

Can you disable the topology(rack-awareness) on hdfs?
That way, all 17 nodes should get the equal amount
(assuming you have enough tasks to run on all the nodes).

Koji


On 9/29/09 10:19 AM, "Stuart White" <stuart.white1@gmail.com> wrote:

> I have a hadoop cluster across 2 racks.  One rack contains 12 nodes,
> the other rack contains 5 nodes.
> 
> When I run a really large job, the disks on the 5 nodes fill up much
> sooner than the disks on the 12 nodes, and I believe it's because the
> 12 nodes are sending their replicated blocks to the 5-node rack.  In
> fact, my job won't finish successfully, due to full disks on the 5
> nodes, even though the overall usage of the cluster is ~75%.
> 
> Is there a way I can tell hadoop not to enforce the "send replicated
> blocks outside the current rack" rule?


Mime
View raw message