hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4354) Public resource localization fails with NPE
Date Fri, 13 Nov 2015 15:52:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004157#comment-15004157
] 

Jason Lowe commented on YARN-4354:
----------------------------------

I believe this was caused by YARN-2902.  A resource was just localized, but the resource is
missing.  That normally doesn't occur.  However after YARN-2902 a resource can be yanked out
while it is still downloading if a container releases it and the refcount is zero.  So if
a public resource is requested by a container but killed before the localization completes
then we can get a localized event for a missing resource and hit the NPE.

We should not be removing a resource if the localization will still complete, otherwise we
not only risk the NPE but also leaking the local files.

> Public resource localization fails with NPE
> -------------------------------------------
>
>                 Key: YARN-4354
>                 URL: https://issues.apache.org/jira/browse/YARN-4354
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>            Priority: Blocker
>
> I saw public localization on nodemanagers get stuck because it was constantly rejecting
requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message