hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
Date Wed, 11 Jun 2014 16:33:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027979#comment-14027979

Jason Lowe commented on YARN-2147:

For example, here's a sample log from a client submitting a job that failed:

2014-05-14 10:36:16,111 [JobControl] INFO org.apache.hadoop.mapred.ResourceMgrDelegate  -
Submitted application application_1394826486018_9924515 to ResourceManager at xx/xx:xx
2014-05-14 10:36:16,116 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter  - Cleaning
up the staging area /user/xx/.staging/job_1394826486018_9924515
2014-05-14 10:36:16,117 [JobControl] ERROR org.apache.hadoop.security.UserGroupInformation
 - PriviledgedActionException as:xx (auth:SIMPLE) cause:java.io.IOException: Failed to run
job : Read timed out
2014-05-14 10:36:16,118 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob
 - xx got an error while submitting 
java.io.IOException: Failed to run job : Read timed out
                at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
                at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:410)
                at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
                at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
                at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
                at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:601)
                at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
                at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)

All the user sees is a read timeout but no details as to where it was connecting or what service
was involved.  Was this a timeout connecting to the RM?  A timeout on the RM side?  Something
else entirely?  Hard to tell from just "Read timed out".  Looking at the exception logged
at the RM side the full stacktrace shows that it was timing out trying to grab a delegation
token from a remote server for webhdfs.  Those kinds of details need to be conveyed back to
the client, either via the full stacktrace from the RM exception or via a more informative
exception message when delegation token renewal fails during app submission.

> client lacks delegation token exception details when application submit fails
> -----------------------------------------------------------------------------
>                 Key: YARN-2147
>                 URL: https://issues.apache.org/jira/browse/YARN-2147
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Priority: Minor
> When an client submits an application and the delegation token process fails the client
can lack critical details needed to understand the nature of the error.  Only the message
of the error exception is conveyed to the client, which sometimes isn't enough to debug.

This message was sent by Atlassian JIRA

View raw message