hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingda Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container
Date Thu, 21 Aug 2014 00:07:26 GMT

     [ https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yingda Chen updated YARN-2433:
------------------------------

    Description: 
With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after
an AM is restarted with containers retained, it appears to be using the stale token to start
new container. This leads to the error below. To truly support container retention, AM should
be able to communicate with previous container(s) with the old token and ask for new container
with new token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_000001
was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_000002

STACK trace:
{code}
hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl
#0 | 103: Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers
{services_meta_data { key: "mapreduce_shuffle" value: "\000\0004\372" } failed_requests {
container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 }
attemptId: 2 } id: 2 } exception { message: "Unauthorized request to start container. \nNMToken
for application attempt : appattempt_1408130608672_0065_000001 was used for starting container
with container token issued for application attempt : appattempt_1408130608672_0065_000002"
trace: "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
\nNMToken for application attempt : appattempt_1408130608672_0065_000001 was used for starting
container with container token issued for application attempt : appattempt_1408130608672_0065_000002\r\n\tat
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native
Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: "org.apache.hadoop.yarn.exceptions.YarnException"
} }}
{code}






  was:
With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after
an AM is restarted with containers retained, it appears to be using the stale token to start
new container. This leads to the error below. To truly support container retention, AM should
be able to communicate with previous container(s) with the old token and ask for new container
with new token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_000001
was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_000002

STACK trace:

hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl
#0 | 103: Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers
{services_meta_data { key: "mapreduce_shuffle" value: "\000\0004\372" } failed_requests {
container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 }
attemptId: 2 } id: 2 } exception { message: "Unauthorized request to start container. \nNMToken
for application attempt : appattempt_1408130608672_0065_000001 was used for starting container
with container token issued for application attempt : appattempt_1408130608672_0065_000002"
trace: "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
\nNMToken for application attempt : appattempt_1408130608672_0065_000001 was used for starting
container with container token issued for application attempt : appattempt_1408130608672_0065_000002\r\n\tat
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native
Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: "org.apache.hadoop.yarn.exceptions.YarnException"
} }}







> Stale token used by restarted AM (with previous containers retained) to request new container
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-2433
>                 URL: https://issues.apache.org/jira/browse/YARN-2433
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0, 2.4.1
>            Reporter: Yingda Chen
>
> With Hadoop 2.4, container retention is supported across AM crash-and-restart. However,
after an AM is restarted with containers retained, it appears to be using the stale token
to start new container. This leads to the error below. To truly support container retention,
AM should be able to communicate with previous container(s) with the old token and ask for
new container with new token. 
> This could be similar to YARN-1321 which was reported and fixed earlier.
> ERROR: 
> Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_000001
was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_000002
> STACK trace:
> {code}
> hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl
#0 | 103: Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers
{services_meta_data { key: "mapreduce_shuffle" value: "\000\0004\372" } failed_requests {
container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 }
attemptId: 2 } id: 2 } exception { message: "Unauthorized request to start container. \nNMToken
for application attempt : appattempt_1408130608672_0065_000001 was used for starting container
with container token issued for application attempt : appattempt_1408130608672_0065_000002"
trace: "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
\nNMToken for application attempt : appattempt_1408130608672_0065_000001 was used for starting
container with container token issued for application attempt : appattempt_1408130608672_0065_000002\r\n\tat
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native
Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: "org.apache.hadoop.yarn.exceptions.YarnException"
} }}
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message