hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2152) Recover missing container information
Date Mon, 14 Jul 2014 22:28:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061354#comment-14061354
] 

Jason Lowe commented on YARN-2152:
----------------------------------

I think this may have broken backwards compatibility for ContainerTokenIdentifier.  We're
running tests with NM restart functionality (see YARN-1336) and during an upgrade the NM failed
to parse a stored container token with this error:

{noformat}
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:159)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:262)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:292)
[...]
{noformat}

It looks like it's trying to parse the new priority and creationTime fields that were added
to the token identifier but old tokens don't have them.

> Recover missing container information
> -------------------------------------
>
>                 Key: YARN-2152
>                 URL: https://issues.apache.org/jira/browse/YARN-2152
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>             Fix For: 2.5.0
>
>         Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch, YARN-2152.3.patch
>
>
> Container information such as container priority and container start time cannot be recovered
because NM container today lacks such container information to send across on NM registration
when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message