hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4354) Public resource localization fails with NPE
Date Fri, 13 Nov 2015 18:15:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004421#comment-15004421
] 

Varun Saxena commented on YARN-4354:
------------------------------------

[~jlowe], I think you are correct. Below code added in YARN-2902 causes the problem.
Public Localizer will continue downloading the resource unlike localizer for private resources
which exits because a DIE is issued.
As you said, because of below addition resource is removed if reference count is 0, but for
a PUBLIC resource a LOCALIZED Event may come even after container has been killed. This wont
happen for private resources though.

{code}
    // Remove the resource if its downloading and its reference count has
    // become 0 after RELEASE. This maybe because a container was killed while
    // localizing and no other container is referring to the resource.
    if (event.getType() == ResourceEventType.RELEASE) {
      if (rsrc.getState() == ResourceState.DOWNLOADING &&
          rsrc.getRefCount() <= 0) {
        removeResource(req);
      }
    }
{code}

I think a check for resource visibility should suffice. What do you think ?

> Public resource localization fails with NPE
> -------------------------------------------
>
>                 Key: YARN-4354
>                 URL: https://issues.apache.org/jira/browse/YARN-4354
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>            Priority: Blocker
>         Attachments: YARN-4354-unittest.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly rejecting
requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message