hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Parks" <davidpark...@yahoo.com>
Subject RE: How can I limit reducers to one-per-node?
Date Sat, 09 Feb 2013 04:46:23 GMT
Looking at the Job File for my job I see that this property is set to 1, however I have 3 reducers
per node (I’m not clear what configuration is causing this behavior).

 

My problem is that, on a 15 node cluster, I set 15 reduce tasks on my job, in hopes that each
would be assigned to a different node, but in the last run 3 nodes had nothing to do, and
3 other nodes had 2 reduce tasks assigned.

 

 

 

From: Nan Zhu [mailto:zhunansjtu@gmail.com] 
Sent: Saturday, February 09, 2013 11:31 AM
To: user@hadoop.apache.org
Subject: Re: How can I limit reducers to one-per-node?

 

I haven't use AWS MR before…..if your instances are configured with 3 reducer slots, it
means that 3 reducers can run at the same time in this node,  

 

what do you mean by "this property is already set to 1 on my cluster"?

 

actually this value can be node-specific, if AWS MR instance allows you to do that, you can
modify mapred-site.xml to change it from 3 to 1

 

Best,

 

-- 

Nan Zhu

School of Computer Science,

McGill University

 

On Friday, 8 February, 2013 at 11:24 PM, David Parks wrote:

Hmm, odd, I’m using AWS Mapreduce, and this property is already set to 1 on my cluster by
default (using 15 m1.xlarge boxes which come with 3 reducer slots configured by default).

 

 

 

From: Nan Zhu [mailto:zhunansjtu@gmail.com] 
Sent: Saturday, February 09, 2013 10:59 AM
To: user@hadoop.apache.org
Subject: Re: How can I limit reducers to one-per-node?

 

I think set tasktracker.reduce.tasks.maximum  to be 1 may meet your requirement

 

 

Best,

 

-- 

Nan Zhu

School of Computer Science,

McGill University

 

 

On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote:

I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only
run 1 reducer per node.

 

This job is network IO bound, gathering images from a set of webservers.

 

My job has certain parameters set to meet “web politeness” standards (e.g. limit connects
and connection frequency).

 

If this job runs from multiple reducers on the same node, those per-host limits will be violated.
 Also, this is a shared environment and I don’t want long running network bound jobs uselessly
taking up all reduce slots.

 

 


Mime
View raw message