hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Hung <YTHu...@winbond.com>
Subject RE: [hadoop 2.2.0] map tasks failed with error "This token is expired"
Date Thu, 05 Jun 2014 05:46:10 GMT
After looking further into this problem, I also found out another JIRA:
https://issues.apache.org/jira/browse/YARN-1417

that explain the fair scheduler also have the problem when dealing with map task that queue
too long.

But the sad news is the fix for Hadoop-2.4.0 and I'm not be able to upgrade it right now because
there is no resource to do it.

So, there are 2 questions I hoping somebody can answer:

1.       One action I can do is to set this value "yarn.resourcemanager.rm.container-allocation.expiry-interval-ms"
to be larger than default 600000 ms, but I don't know what will happens to overall system
when I set it to 7 days?
Because right now there is a urgent job that will analyze 1 TB data and could take 3 to 5
days to complete, and I use fair scheduler to constraint the number of containers to run this
huge job, so it will not impact small job that need to be executed in daily basis.

2.       Is there a way to port the fix into Hadoop 2.2.0? could you give me some direction
to which java files need to be looked at?
I already try to compare 2.2.0 src and 2.4.0 src, but a lot have changed and I kind of spinning
around in place right now.

Best regards,
Henry Hung

From: MA33 YTHung1
Sent: Thursday, June 05, 2014 10:35 AM
To: user@hadoop.apache.org
Subject: [hadoop 2.2.0] map tasks failed with error "This token is expired"

Hi All,

Strange thing happens after I start to use Fair Scheduler, when executing a large MR job (around
660 maps and 1 reduce), some of the map tasks will failed with this error:

2014-06-05 10:13:47,379 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Unauthorized request to start container.
This token is expired. current time is 1401934427379 found 1401933840832

I already double check the timestamp of all yarn servers and I'm very sure that all the servers
are on sync, then I found this JIRA discussing about capacity scheduler problem with using
timestamp when reserving container:
https://issues.apache.org/jira/browse/YARN-180

I want to know if this problem also exist inside fair scheduler? And is there a fix for it?

Best regards,
Henry

________________________________
The privileged confidential information contained in this email is intended for use only by
the addressees as indicated by the original sender of this email. If you are not the addressee
indicated in this email or are not responsible for delivery of the email to such a person,
please kindly reply to the sender indicating this fact and delete all copies of it from your
computer and network server immediately. Your cooperation is highly appreciated. It is advised
that any unauthorized use of confidential information of Winbond is strictly prohibited; and
any information in this email irrelevant to the official business of Winbond shall be deemed
as neither given nor endorsed by Winbond.

________________________________
The privileged confidential information contained in this email is intended for use only by
the addressees as indicated by the original sender of this email. If you are not the addressee
indicated in this email or are not responsible for delivery of the email to such a person,
please kindly reply to the sender indicating this fact and delete all copies of it from your
computer and network server immediately. Your cooperation is highly appreciated. It is advised
that any unauthorized use of confidential information of Winbond is strictly prohibited; and
any information in this email irrelevant to the official business of Winbond shall be deemed
as neither given nor endorsed by Winbond.

Mime
View raw message