hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
Date Tue, 05 Jan 2016 22:41:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083974#comment-15083974

Jason Lowe commented on YARN-1011:

bq. Tasks are incorrectly over-allocated. Will never use the resources they ask for and hence
we can safely run additional opportunistic containers. So this feature is used to compensate
for poorly configured applications. Probably a valid scenario but is it common?

In my experience this is fairly common.  Users tend to twiddle with config values until something
is working then they don't bother to revisit until there's a problem.  And it's easier to
over allocate than to spend the time to carefully tune the task size.  Even if the user is
interested in tuning they can't always tune optimally.  Some examples are data skew or other
task-specific issues where a few tasks need a lot of memory but the vast majority of the others
do not.  Many frameworks only allow the task sizes to be configured as a group, so the user
has to run all the tasks in the group with the worst-case container size even though most
of them don't need it.  Pig on MapReduce is another example, where it will spawn multiple
jobs but the user can only configure the memory settings once in the script and they apply
to all jobs launched by the script.  Therefore the user has to set it to the worst-case size
across all the script's jobs, and all but one of the jobs runs with oversized map containers.

> [Umbrella] Schedule containers based on utilization of currently allocated containers
> -------------------------------------------------------------------------------------
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
> Currently RM allocates containers and assumes resources allocated are utilized.
> RM can, and should, get to a point where it measures utilization of allocated containers
and, if appropriate, allocate more (speculative?) containers.

This message was sent by Atlassian JIRA

View raw message