hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Re: Increase number of map slots
Date Wed, 06 Jun 2012 17:08:37 GMT

On Jun 6, 2012, at 03:42 , Harsh J wrote:

>> I think mapred.tasktracker.map.tasks.maximum sets the number of map
> tasks and not slots.
> This is incorrect. The property does configure slots. Please also see
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces and
> http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F
> for more.

But Harsh, wouldn't you agree that the first reference you provided above is talking about
the number of tasks spawned for a given job at job-runtime and not the number of slots hard-configured
into the cluster at cluster-spinup time?

Incidentally, the second reference above is partially broken.  It attempts to offer links
to dig into further detail about mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum,
but the links are broken.  For example, one of the two broken links is:


It's still broken even if you remove the anchor from the end of the URL, which is to say the
hadoop-default.html webpage doesn't even exist.

In fact, it is difficult find any official documentation on those properties (Google searches
for the terms do not provide links to any proper documentation within apache, but rather just
lots of back and forth forum discussions about the properties).  One thing I did find was
a claim that those properties are deprecated in 2.0.0:


That page indicates that they were replaced with equivalents in which the first component
is now 'mapreduce', not 'mapred'.  Even with the new terms however, Google still doesn't link
to any formal documentation describing those properties.  In fact, I have yet to find a webpage
anywhere which officially states the purpose/effect of mapred(uce).tasktracker.map.tasks.maximum.

That said, I agree that the consensus of discussion and description seems to imply that these
properties have a cluster-level (not job-level) effect on the number of map/reduce slots on
the cluster, not the number of tasks spawned for a given job.  Such a concept obviously convolutes
the intuition that slots correspond to cores as I suggested in an earlier post and I apologize
for that.  

Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland

View raw message