hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Wed, 24 Jun 2015 12:04:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599307#comment-14599307
] 

Varun Saxena commented on YARN-2902:
------------------------------------

Thanks for the review [~jlowe].

bq. I think it would be better for the executor to let us know when a localizer has completed
rather than assuming 1 second will be enough time (or too much time). We can tackle this in
a followup JIRA since it's a more significant change, as I'm not sure executors are tracking
localizers today.
We do not track localizers from executors. But issue is how do we track them ? Get PID of
the localizer process and check if localizer has died ? But here the issue can be what if
in between checks, localizer dies and PID is taken by some other process.
We primarily want localizer to die so that it doesn't download anything after we do the deletion.
One option would be to add a status in heartbeat asking localizer to cleanup(stop its downloading
threads) and once that is done, indicate NM to do the deletion in another heartbeat. On this
HB, NM can do the deletion and Localizer on HB response can DIE. Thoughts ? 

> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message