hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: knowing the nodes on which reduce tasks will run
Date Mon, 03 Sep 2012 15:59:07 GMT
The short answer is no. 
The longer answer is that you can attempt to force data locality, however even then if an
open slot becomes available, its used regardless of what you want to do...

On Sep 3, 2012, at 9:19 AM, Abhay Ratnaparkhi <abhay.ratnaparkhi@gmail.com> wrote:

> Hello,
> How can one get to know the nodes on which reduce tasks will run?
> One of my job is running and it's completing all the map tasks.
> My map tasks write lots of intermediate data. The intermediate directory is getting full
on all the nodes. 
> If the reduce task take any node from cluster then It'll try to copy the data to same
disk and it'll eventually fail due to Disk space related exceptions.
> I have added few more tasktracker nodes in the cluster and now want to run reducer on
new nodes only.
> Is it possible to choose a node on which the reducer will run? What's the algorithm hadoop
uses to get a new node to run reducer?
> Thanks in advance.
> Bye
> Abhay

View raw message