hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
Date Tue, 27 Oct 2015 18:37:28 GMT

    [ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976928#comment-14976928
] 

Varun Saxena commented on YARN-2902:
------------------------------------

Thanks a lot [~jlowe] for the review.
I was under the incorrect impression that the resource downloading will not be taken up by
other containers again. You are correct we should not FAIL the resource here. It will be taken
up by outstanding container when next HB comes. If we do not call handleDownloadingRsrcsOnCleanup,
we wont require to synchronize scheduled map as well.

Also event.getResource().getLocalPath()  can be used here too. This would preclude the need
for ScheduledResource class and hence the refactoring associated with it.

However, as resource would not be explicitly FAILED in this case, we should probably do some
cleanup when reference count of downloading resource becomes 0. Otherwise entry associated
with the downloading resource will remain in LocalResourcesTrackerImpl#localResourceMap and
this may show up when cache cleanup is done.
And we may turn up with the same log {{LOG.error("Attempt to remove resource: " + rsrc + "
with non-zero refcount");}} even though the resource is deleted on disk.
I think in LocalResourcesTrackerImpl#handle, after handling RELEASE event, we should check
if the reference count is 0 and whether state of resource is DOWNLOADING. And if this is so,
call LocalResourcesTrackerImpl#removeResource.
Thoughts ?



> Killing a container that is localizing can orphan resources in the DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch,
YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then resources
are left in the DOWNLOADING state.  If no other container comes along and requests these resources
they linger around with no reference counts but aren't cleaned up during normal cache cleanup
scans since it will never delete resources in the DOWNLOADING state even if their reference
count is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message