hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default
Date Thu, 05 Nov 2009 16:58:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773969#action_12773969
] 

Allen Wittenauer commented on MAPREDUCE-1184:
---------------------------------------------

>Why not let it be and change site-specific, job-specific configuration?

In my experience, users don't set this until they've been around the Hadoop block for a while,
and even then, this one is easy to miss. 

The other reality is that few users only run "one" job.  It is much more typical to run a
series of jobs as part of a work flow.  Doing specific, low-level tuning of every knob for
every job is asking too much.  For those users that do want to do that, then they'll eventually
hit this and tune appropriately.  But that doesn't mean we shouldn't ship a 'reasonable' default
until they get around to setting it themselves.

>I think Allen's point is that the default 5% may be too low from the utilization perspective.


... and that's exactly my point.  Inexperienced users wonder why all their reduce slots are
not being utilized to get the max throughput of the grid.  They have one big job that has
all the reduce slots gone, sometimes for hours at a time, when a smaller job has all of its
maps finished and just needs a handful of reduces to go.  By setting this to reasonable default,
chances are this very common case will disappear out-of-the-box.

While I think it would be great to see this tunable go away, that's not where we are at today.
 So let's just set this to something reasonable and then look at the bigger problem at some
later date.  There are bigger fish to fry. :)

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code
isn't efficient enough to be set this low.  This should be higher, probably around the 50%
mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message