hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Increase number of map slots
Date Thu, 14 Jun 2012 15:11:12 GMT
Hey Keith,

Sorry for the late response here (I had meant to reply but I believe I
got distracted and forgot all about it):

I agree with you on all counts. The config is indeed for service-level
slots. My reply was to only correct Kartheek's assumptions.

Regarding documentation - I'd love to be able to correct them up
myself, but lack the focussed time at the moment to do so right away.
I am willing to review and commit it in for you though, if you're
willing to contribute! Please just let me know the JIRA after you've
filed one and I will track it.

Note that if you use YARN to run the new MR2 code (MR API is the same,
just the platform/submission-execution model has changed), the concept
of hard slots have gone away and presently the slots are determined
via the job's memory request (mapreduce.{map/reduce}.memory.mb)
against a NodeManager's total offered memory for service. There is no
longer a single hard config that controls max number of tasks that may
run simultaneously per node (but can be achieved via some node
manager/scheduler memory resource config hacks, ending up to be
brittle though).

CPU-specific requests are coming soon for YARN:

On Wed, Jun 6, 2012 at 10:38 PM, Keith Wiley <kwiley@keithwiley.com> wrote:
> On Jun 6, 2012, at 03:42 , Harsh J wrote:
>>> I think mapred.tasktracker.map.tasks.maximum sets the number of map
>> tasks and not slots.
>> This is incorrect. The property does configure slots. Please also see
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces and
>> http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F
>> for more.
> But Harsh, wouldn't you agree that the first reference you provided above is talking
about the number of tasks spawned for a given job at job-runtime and not the number of slots
hard-configured into the cluster at cluster-spinup time?
> Incidentally, the second reference above is partially broken.  It attempts to offer
links to dig into further detail about mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum,
but the links are broken.  For example, one of the two broken links is:
> http://hadoop.apache.org/common/docs/current/hadoop-default.html#mapred.tasktracker.map.tasks.maximum
> It's still broken even if you remove the anchor from the end of the URL, which is to
say the hadoop-default.html webpage doesn't even exist.
> In fact, it is difficult find any official documentation on those properties (Google
searches for the terms do not provide links to any proper documentation within apache, but
rather just lots of back and forth forum discussions about the properties).  One thing I
did find was a claim that those properties are deprecated in 2.0.0:
> http://hadoop.apache.org/common/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
> That page indicates that they were replaced with equivalents in which the first component
is now 'mapreduce', not 'mapred'.  Even with the new terms however, Google still doesn't
link to any formal documentation describing those properties.  In fact, I have yet to find
a webpage anywhere which officially states the purpose/effect of mapred(uce).tasktracker.map.tasks.maximum.
> That said, I agree that the consensus of discussion and description seems to imply that
these properties have a cluster-level (not job-level) effect on the number of map/reduce slots
on the cluster, not the number of tasks spawned for a given job.  Such a concept obviously
convolutes the intuition that slots correspond to cores as I suggested in an earlier post
and I apologize for that.
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> "Yet mark his perfect self-contentment, and hence learn his lesson, that to be
> self-contented is to be vile and ignorant, and that to aspire is better than to
> be blindly and impotently happy."
>                                           --  Edwin A. Abbott,
> ________________________________________________________________________________

Harsh J

View raw message