hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: knowing the nodes on which reduce tasks will run
Date Tue, 04 Sep 2012 11:33:31 GMT
On 3 September 2012 15:19, Abhay Ratnaparkhi <abhay.ratnaparkhi@gmail.com>wrote:

> Hello,
> How can one get to know the nodes on which reduce tasks will run?
> One of my job is running and it's completing all the map tasks.
> My map tasks write lots of intermediate data. The intermediate directory
> is getting full on all the nodes.
> If the reduce task take any node from cluster then It'll try to copy the
> data to same disk and it'll eventually fail due to Disk space related
> exceptions.
you could always set up specific partitions for intermediate data, though
you get better bandwidth by striping the data across all disks, and better
flexibility by sharing the same partition.

There's also a property to set the amount of space to allocate for DFS
storage; reduce that by changing  dfs.datanode.du.reserved and the
datanodes will leave more free space around.

see: http://wiki.apache.org/hadoop/DiskSetup

View raw message