hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-4273) Containers can be leaked due to race between application being killed and NM registering back after recovery
Date Fri, 16 Oct 2015 11:51:05 GMT
Varun Saxena created YARN-4273:
----------------------------------

             Summary: Containers can be leaked due to race between application being killed
and NM registering back after recovery
                 Key: YARN-4273
                 URL: https://issues.apache.org/jira/browse/YARN-4273
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.7.1
            Reporter: Varun Saxena
            Assignee: Varun Saxena


This issue is based on discussion on YARN-4000
Consider this scenario : 
1) Application is recovered and added into scheduler, some slow NM has not re-registered back,
so those containers are not yet recovered.
2) User kills this app
3) CapacityScheduler#doneApplicationAttempt is called, containers tracked by RM so far are
killed. Note that CapacityScheduler#doneApplication is not called, so scheduler still has
the SchedulerApplication in memory
4) Slow NM now re-registers and try to recover the containers. If application is set to keep
containers across attempts, these containers will be recovered even though application is
in the process of being killed. These container will not be killed later on. Hence, these
containers are leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message