hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Parks" <davidpark...@yahoo.com>
Subject RE: What does mapred.map.tasksperslot do?
Date Thu, 27 Dec 2012 09:42:27 GMT
Ah this is on AWS EMR, hadoop 1.0.3. This could be an AWS feature based on
my reading of the AWS docs, but I thought it was hadoop.

 

 

 

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com] 
Sent: Thursday, December 27, 2012 3:43 PM
To: user@hadoop.apache.org
Subject: Re: What does mapred.map.tasksperslot do?

 

David,

 

Could you please tell what version of Hadoop you are using ? I don't see
this parameter in the stable (1.x) or current branch. I only see references
to it with respect to EMR and with Hadoop 0.18 or so. 

 

On Thu, Dec 27, 2012 at 1:51 PM, David Parks <davidparks21@yahoo.com> wrote:

I didn't come up with much in a google search.

 

In particular, what are the side effects of changing this setting? Memory?
Sort process?

 

I'm guessing it means that it'll feed 2 map tasks as input to each map task,
a map task in turn is a self-contained JVM which consumes one map slot.

 

Thus 4 map slots, and 2 tasksperslot means 4 map task JVMs each of which
process 2 input splits at a time.

 

By increasing the tasksperslot I presume we reduce overhead needed to start
a new task (even though we're re-using the JVM in typical configuration,
ours included), but we have more map output to sort and shuffle (I presume
the results of both map splits go into the same output).

 

Can someone verify those presumptions?

 


Mime
View raw message