hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayank Bansal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2055) Preemtion: Jobs are failing due to AMs are getting launched and killed multiple times
Date Wed, 14 May 2014 06:14:17 GMT

     [ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mayank Bansal updated YARN-2055:
--------------------------------

    Description: If Queue A does not have enough capacity to run AM, then AM will borrow capacity
from queue B to run AM in that case AM will be killed if queue B will reclaim its capacity
and again AM will be launched and killed again, in that case job will be failed.  (was: Cluster
Size = 16GB [2NM's]
Queue A Capacity = 50%
Queue B Capacity = 50%
Consider there are 3 applications running in Queue A which has taken the full cluster capacity.

J1 = 2GB AM + 1GB * 4 Maps
J2 = 2GB AM + 1GB * 4 Maps
J3 = 2GB AM + 1GB * 2 Maps

Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
Currently in this scenario, Jobs J3 will get killed including its AM.

It is better if AM can be given least priority among multiple applications. In this same scenario,
map tasks from J3 and J2 can be preempted.
Later when cluster is free, maps can be allocated to these Jobs.)

> Preemtion: Jobs are failing due to AMs are getting launched and killed multiple times
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-2055
>                 URL: https://issues.apache.org/jira/browse/YARN-2055
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Mayank Bansal
>             Fix For: 2.1.0-beta
>
>
> If Queue A does not have enough capacity to run AM, then AM will borrow capacity from
queue B to run AM in that case AM will be killed if queue B will reclaim its capacity and
again AM will be launched and killed again, in that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message