hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3464) Race condition in LocalizerRunner causes container localization timeout.
Date Wed, 08 Apr 2015 08:01:12 GMT
zhihai xu created YARN-3464:
-------------------------------

             Summary: Race condition in LocalizerRunner causes container localization timeout.
                 Key: YARN-3464
                 URL: https://issues.apache.org/jira/browse/YARN-3464
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
            Reporter: zhihai xu
            Assignee: zhihai xu
            Priority: Critical


Race condition in LocalizerRunner causes container localization timeout.
Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent
is empty.
{code}
      } else if (pending.isEmpty()) {
        action = LocalizerAction.DIE;
      }
{code}
If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer
due to empty pending list, this LocalizerResourceRequestEvent will never be handled.
The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message