hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Fri, 09 Oct 2015 19:02:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951016#comment-14951016

Varun Saxena commented on YARN-2902:

Just to let you know, one case where this wont work, I mean after removal of flag from protocol.

1. NM recovery is disabled.
2. Container is killed. Associated resources are stuck in downloading state and a deletion
task is launched for them.
3. In the meantime localizer downloads a resource and on next HB, Localizer reports a downloaded
resource to NM. In NM this will be in downloading state.
4. NM indicates localizer to DIE. Localizer wont delete the resource just downloaded.
5. NM crashes.
6. NM would missing deleting the downloading resource as well as recovery is disabled.

This I agree though should be a very rare scenario and we can skip it.

> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch,
YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.

This message was sent by Atlassian JIRA

View raw message