hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Shechter (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
Date Sun, 16 Aug 2015 12:24:45 GMT

     [ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dan Shechter updated YARN-3997:
-------------------------------
    Description: 
When our cluster is configured with preemption, and is fully loaded with an application consuming
1-core containers, it will not kill off these containers when a new application kicks in requesting
containers with a size > 1, for example 4 core containers.

When the "second" application attempts to us 1-core containers as well, preemption proceeds
as planned and everything works properly.

It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some
container to make room for the new application, fails to find a SINGLE container satisfying
the request for a 4-core container (since all existing containers are 1-core containers),
and isn't "smart" enough to realize it needs to kill off 4 single-core containers (in this
case) on a single node, for the new application to be able to proceed...

The exhibited affect is that the new application is hung indefinitely and never gets the resources
it requires.

This can easily be replicated with any yarn application.
Our "goto" scenario in this case is running pyspark with 1-core executors (containers) while
trying to launch h20.ai framework which INSISTS on having at least 4 cores per container.

  was:
When our cluster is configures with preemption, and is fully loaded with an application consuming
1-core containers, it will not kill off these containers when a new application kicks in requesting,
for example 4 core containers.

When the "second" application attempts to us 1-core containers as well, preemption proceeds
as planned and everything works properly.

It is my assumptiom, that the fair-scheduler, while recognizing it needs to kill off some
container to make room for the new application, fails to find a SINGLE container satisfying
the request for a 4-core container (since all existing containers are 1-core containers),
and isn't "smart" enough to realize it needs to kill off 4 single-core containers (in this
case) on a single node, for the new application to be able to proceed...

The exhibited affect is that the new application is hung indefinitely and never gets the resources
it requires.

This can easily be replicated with any yarn application.
Our "goto" scenario in this case is running pyspark with 1-core executors (containers) while
trying to launch h20.ai framework which INSISTS on having at least 4 cores per container.


> An Application requesting multiple core containers can't preempt running application
made of single core containers
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3997
>                 URL: https://issues.apache.org/jira/browse/YARN-3997
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
>            Reporter: Dan Shechter
>            Assignee: Karthik Kambatla
>            Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an application
consuming 1-core containers, it will not kill off these containers when a new application
kicks in requesting containers with a size > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well, preemption proceeds
as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to kill off
some container to make room for the new application, fails to find a SINGLE container satisfying
the request for a 4-core container (since all existing containers are 1-core containers),
and isn't "smart" enough to realize it needs to kill off 4 single-core containers (in this
case) on a single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and never gets
the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors (containers)
while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message