hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francis Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5583) Ability to limit running map and reduce tasks
Date Fri, 18 Oct 2013 18:30:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799382#comment-13799382

Francis Liu commented on MAPREDUCE-5583:

Cluster with 100,000 containers, 1,000 jobs, each with 100000 tasks, and specifies that they
can only run 5 tasks. So, you are now only using 5% of the cluster and no one makes progress
leading to very poor utilization and peanut-buttering effect.
Given that YARN is supposed to engender a diverse set of AMs. This seems to be a problem that
should be solved by the RM anyway? I'm not that familiar with the scheduler, but if we were
to use queues to limit the number of tasks the outcome would be the same wouldn't it? Since
we're bound by the upper-limit config of the max jobs? 

Some form of admin control (e.g. queue with a max-cap) for a small number of use-cases where
you actually need this feature is much safer.
We have a number of use cases and it is growing. I'm hoping we can come up with a solution
that does not require users to hack the MRv2 AM. This would not only be useful as a manual
MR config. I can see this being useful as something an InputFormat/OutputFormat automatically
sets or maybe even something that DSLs can leverage. Apart from queues some users control
this by limiting the number of reducers or controlling the map task. The latter is done by
merging split files which is undesirable as it would make a task failure costly. So it'd be
great if we could have a clean way of doing this.

> Ability to limit running map and reduce tasks
> ---------------------------------------------
>                 Key: MAPREDUCE-5583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5583
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.9, 2.1.1-beta
>            Reporter: Jason Lowe
> It would be nice if users could specify a limit to the number of map or reduce tasks
that are running simultaneously.  Occasionally users are performing operations in tasks that
can lead to DDoS scenarios if too many tasks run simultaneously (e.g.: accessing a database,
web service, etc.).  Having the ability to throttle the number of tasks simultaneously running
would provide users a way to mitigate issues with too many tasks on a large cluster attempting
to access a serivce at any one time.
> This is similar to the functionality requested by MAPREDUCE-224 and implemented by HADOOP-3412
but was dropped in mrv2.

This message was sent by Atlassian JIRA

View raw message