hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ed <edor...@gmail.com>
Subject YARN: "Unauthorized request to start container, Expired Token" causes job failure
Date Fri, 16 Oct 2015 19:41:36 GMT
Hello,

We just kicked off a large MR job that uses all the containers on our
cluster.  The job ran for 24 hours and then failed with the following error
in the map phase (no reducers had started yet):

2015-10-16 12:38:17,781 ERROR [ContainerLauncher #2]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Container launch failed for container_1444916180373_0003_01_089692 :
org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
start container.

This token is expired. current time is 1445013467749 found 1445013416633

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)

        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

        at
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)

        at
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

        at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)

        at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


The job has not had issues in the past although this time it was running on
a particularly large dataset.  I checked all of our nodes and the times on
the nodes are all properly synced with NTP.  I found the JIRA issue
YARN-1417 which seems to describe the problem we're having (
https://issues.apache.org/jira/browse/YARN-1417) but this issue is mark
resolved and the patch was included in CDH5.0.0 (we are running 5.0.2) so
we should not be having that particular problem.


Could this be another bug in YARN related to expired tokens being
assigned?  I searched through JIRA but did not see any open issues that
might relate to the error we're seeing.  Are there any work around to this
or has anyone seen this happen before?  Please let me know if there is any
other information I can provide.


Best Regards,


Ed Dorsey

Mime
View raw message