hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Mon, 26 Oct 2015 22:02:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975185#comment-14975185

Jason Lowe commented on YARN-2902:

Forgot to respond to this comment:

bq. That is if NM recovery is not enabled and the deletion task is scheduled. But the deletion
task is put in the deletion service's executor queue because all the 4 threads in deletion
service's executor(NM delete threads) are occupied. If NM goes down before this task is taken
up, the downloading resources wont be deleted.

If NM recovery is not enabled then failing to delete when the NM crashes is already a known
issue.  As for the normal termination scenario we should be stopping the ResourceLocalizationService
(via the ContainerManager shutdown) before trying to stop the DeletionService, so I would
expect deletions to be queued up before we stop that service.

> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch,
YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.

This message was sent by Atlassian JIRA

View raw message