hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2905) CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)
Date Wed, 12 Oct 2011 13:29:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125843#comment-13125843
] 

Harsh J commented on MAPREDUCE-2905:
------------------------------------

Jeff,

I'll leave the final review to people better suited to reviewing FairScheduler patches, but
am gonna post some notes on getting this patch to an acceptable state:

A few nits, hence:

- Patch is mixing spaces and tabs. Follow the coding guidelines and use only spaces. 2 spaces
per indent instead of hard tab characters which seem present right now.
- If you'd like to get this included upstream, you'll have to re-up the patch with permission
grants to ASF. This is doable when you attach a file (look for an option at the bottom --
or perhaps you missed it accidentally).

If possible, can we somehow have a test for this? Just asking.
                
> CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple
per job)
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2905
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>    Affects Versions: 0.20.2
>            Reporter: Jeff Bean
>         Attachments: MR-2905.patch
>
>
> We encountered a situation where in the same cluster, large jobs benefit from mapred.fairscheduler.assignmultiple,
but small jobs with small numbers of mappers do not: the mappers all clump to fully occupy
just a few nodes, which causes those nodes to saturate and bottleneck. The desired behavior
is to spread the job across more nodes so that a relatively small job doesn't saturate any
node in the cluster.
> Testing has shown that setting mapred.fairscheduler.assignmultiple to false gives the
desired behavior for small jobs, but is unnecessary for large jobs. However, since this is
a cluster-wide setting, we can't properly tune.
> It'd be nice if jobs can set a param similar to mapred.fairscheduler.assignmultiple on
submission to better control the task distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message