hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3464) Race condition in LocalizerRunner causes container localization timeout.
Date Wed, 08 Apr 2015 18:45:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485775#comment-14485775
] 

zhihai xu commented on YARN-3464:
---------------------------------

[~kasha], thanks for the information. I just looked at YARN-3024, Yes, it will make this issue
happen more frequently.
Before YARN-3024, The localization for private resource is one by one. The next one won't
start until the current one finish localization.
It will take longer time for private resource localization.
With YARN-3024, The localization will be done in parallel, multiple files can be localized
at the same time.
The chance for ContainerLocalizer being killed when the last two PRIVATE LocalizerResourceRequestEvent
are added is bigger.
Yes, your suggestion is also what I thought.

> Race condition in LocalizerRunner causes container localization timeout.
> ------------------------------------------------------------------------
>
>                 Key: YARN-3464
>                 URL: https://issues.apache.org/jira/browse/YARN-3464
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>
> Race condition in LocalizerRunner causes container localization timeout.
> Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent
is empty.
> {code}
>       } else if (pending.isEmpty()) {
>         action = LocalizerAction.DIE;
>       }
> {code}
> If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer
due to empty pending list, this LocalizerResourceRequestEvent will never be handled.
> Without ContainerLocalizer, LocalizerRunner#update will never be called.
> The container will stay at LOCALIZING state, until the container is killed by AM due
to TASK_TIMEOUT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message