Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 12 May 2016 17:05:13 +0000 (UTC)
From: "Wangda Tan (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12850219.1438256123000.177729.1463072713062@Atlassian.JIRA>
In-Reply-To: <JIRA.12850219.1438256123000@Atlassian.JIRA>
References: <JIRA.12850219.1438256123000@Atlassian.JIRA> <JIRA.12850219.1438256123667@arcas>
Subject: [jira] [Updated] (YARN-3997) An Application requesting multiple
 core containers can't preempt running application made of single core
 containers
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 12 May 2016 17:05:15 -0000


     [ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wangda Tan updated YARN-3997:
-----------------------------
    Target Version/s: 2.9.0  (was: 2.8.0)

> An Application requesting multiple core containers can't preempt running application made of single core containers
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3997
>                 URL: https://issues.apache.org/jira/browse/YARN-3997
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
>            Reporter: Dan Shechter
>            Assignee: Arun Suresh
>            Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't "smart" enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org