hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Fri, 18 Sep 2015 09:28:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805279#comment-14805279

Varun Saxena commented on YARN-2902:

bq. As far as properly handling DIE so we actually stop downloading and problems canceling
active transfers, can't we just have the localizer forcibly tear down the JVM? If we're being
told to DIE then I assume we really don't care about pending transfers completing and just
want to get out. If the NM is going to clean up after the localizer anyway, seems like we
can drastically simplify DIE handling and just exit the JVM. That seems like a change that's
targeted enough to be appropriate for 2.7 instead of adding localizer kill support, etc.
In container localizer, when processing HB DIE response, we send another localizer status
to NM. Is it really required ? What do you think ?
I think as soon as we get DIE, we can follow current code of cancelling pending tasks, although
not wait for them to complete(as is being done in newly added code in patch) and  delete paths
reported in last status. And then just return from the loop for a graceful shutdown(after
stopping executors).
Or are you suggesting System exit ?

>From the NM side, we can have a deletion task after some configured delay(same as right
now). We will never cancel this deletion task though unlike code in patch now.

This way localizer should quit quickly and NM can cleanup.
I will change the behavior of executor on deletion as well i.e. I will ignore missing paths
by default. Wont add flag.

> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch,
YARN-2902.06.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.

This message was sent by Atlassian JIRA

View raw message