hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Fri, 09 Oct 2015 18:54:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950989#comment-14950989

Varun Saxena commented on YARN-2902:

[~jlowe], thanks for looking at the patch. 

The reason I inserted delete downloading flag in the protocol was to indicate to localizer
that the resources it reported to NM in last HB were not processed by NM. So localizer needs
to delete them. That is why an extra list of paths was maintained in localizer(paths which
have been reported to NM for download).
I was primarily working on the principle that we can delete as much as we can in localizer.
So that if NM crashes and its not work preserving, paths can be deleted. And vice versa. So
2 points of deletion can make it almost sure that downloading resources are deleted.

But yeah this does make it complex.

You are correct that NM will know about these paths as well and can delete them. The extra
flag in localizer protocol thus can be removed.

I will update the patch.

> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch,
YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.

This message was sent by Atlassian JIRA

View raw message