hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Wed, 25 Feb 2015 16:22:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336691#comment-14336691
] 

Varun Saxena commented on YARN-2902:
------------------------------------

[~jlowe], looked into it. Was able to simulate the issue as well for PRIVATE resources.
I think we need to handle only for PRIVATE resources. APPLICATION resources will be cleaned
up when application finishes. And PUBLIC resources should not remain orphaned as we do not
kill or stop PublicLocalizer in between.

To download the resource, FSDownload appends a _tmp at the end of the directory to which resource
will be downloaded to.
And while processing HB from Container Localizer, NM sends a destination path for the resource
to be downloaded in response. 
We also download one resource at a time.

So, we can store this destination path in a queue in LocalizerRunner whenever we are sending
a new path for download and remove it when fetch is successful. When container is killed (which
causes LocalizerRunner to be cleaned up) we can fetch the path from the front of the queue
and submit the associated temp path for deletion to DeletionService, if ref count for the
resource is 0.

We cannot do this cleanup in ContainerLocalizer as LCE launches it as a new process and kills
it when LocalizerRunner is interrupted.

> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>             Fix For: 2.7.0
>
>         Attachments: YARN-2902.002.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message