hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi Ozawa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4039) New AM instances waste resource by waiting only for resource availability when all available resources are already used
Date Wed, 12 Aug 2015 19:25:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694042#comment-14694042

Tsuyoshi Ozawa commented on YARN-4039:

[~frsyuki] thank you for taking this issue. The problem should be fixed since it can cause
unexpected starvation by resource allocations for Application Masters. There are 2 solutions
to address this problem:

1. Supporting gang scheduling at YARN-level natively. 
As [~tucu00] [mentioned|https://issues.apache.org/jira/browse/YARN-624?focusedCommentId=13658637&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13658637],
this scheduling method can solve the resource starvation because lots AMs consume containers
As a workaround implementation for supporting gang scheduling for small jobs, one possible
way is to fall back to uber mode when job is small enough. It prevents deadlock because of
AM allocations.  However, some problems are under discussion. 
2. Changing the priority of AMs before launching containers for *tasks* as you uploaded. This
way can solve the problem, but there is a tradeoff as you mentioned. 

IMHO, the approach 2 is more easy to implement. Do you have any result of benchmark for making
the influences of the patch clear?

Additionally, minor nits at code level:
+      if (weight.getWeight(ResourceType.MEMORY) < targetWeight.getWeight(ResourceType.MEMORY))

Could you update to use ResourceCalculator#compare via FSQueue#SchedulingPolicy instead of
getWeight() because it can break semantics of DRF? Also, could you update the patch on trunk

> New AM instances waste resource by waiting only for resource availability when all available
resources are already used
> -----------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4039
>                 URL: https://issues.apache.org/jira/browse/YARN-4039
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0
>            Reporter: Sadayuki Furuhashi
>            Assignee: Sadayuki Furuhashi
>         Attachments: YARN-4039.1.patch
> Problem:
> In FairScheduler, maxRunningApps doesn't work well if we can't predict size of an application
in a queue because small maxRunningApps can't use all resources if many small applications
are issued, where large maxRunningApps wastes resources if large applications run.
> Background:
> We're using FairScheduler. In following scenario, AM instances wastes resources significantly:
> * A queue has X MB of capacity.
> * An application requests 32 containers where a container requires (X / 32) MB of memory
> ** In this case, a single application occupies entire resource of the queue.
> * Many those applications are issued (10 applications)
> * Ideal behavior is that applications run one by one to maximize throughput.
> * However, all applications run simultaneously. As the result, AM instances occupy resources
and prevent other tasks from starting. At worst case, most of resources are occupied by waiting
AMs and applications progress very slowly.
> A solution is setting maxRunningApps to 1 or 2. However, it doesn't work well if following
workload exists at the same queue:
> * An application requests 2 containers where a container requires (X / 32) MB of memory
> * Many those applications are issued (say, 10 applications)
> * Ideal behavior is that all applications run simultaneously to maximize concurrency
and throughput.
> * However, number of applications are limited by maxRunningApps. At worst case, most
of resources are idling.
> This problem happens especially with Hive because we can't estimate size of a MapReduce
> Solution:
> AM doesn't have to start if there are waiting resource requests because the AM can't
grant resource requests even if it starts.
> Patch:
> I attached a patch that implements this behavior. But this implementation has this trade-off:
> * When AM is registered to FairScheduler, its demand is 0 because even AM attempt is
not created. Starting this AM doesn't change resource demand of a queue. So, if many AMs are
issued to a queue at the same time, all AMs will be RUNNING. But we want to prevent it.
> * When a AM starts, demand of the AM is only AM attempt. Then AM requires more resources.
Until AM requires resources, demand of the queue is low. But starting AM during this time
will start unnecessary AMs. 
> * So, this patch doesn't start immediately when AM is registered. Instead, it starts
AM only every continuous-scheduling-sleep-ms.
> * Setting large continuous-scheduling-sleep-ms will prevent wasting AMs. But increases
> Therefore, this patch is enabled only if new option "demand-block-am-enabled" is true.

This message was sent by Atlassian JIRA

View raw message