hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Jain (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4442) Accessing hadoop counters from a job is unreliable in yarn during in AM process cleanup window
Date Fri, 13 Jul 2012 20:24:34 GMT
Rahul Jain created MAPREDUCE-4442:
-------------------------------------

             Summary: Accessing hadoop counters from a job is unreliable in yarn during in
AM process cleanup  window
                 Key: MAPREDUCE-4442
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4442
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.0.0-alpha
            Reporter: Rahul Jain


We found this issue during our tests moving from MapReduceV1 to MapReduceV2. A few of our
applications access job counters multiple times:

a) After submission of job, while job is execution (works fine)

b) Right after job complete notification is received (works fine)

c) Few seconds after job complete notification (fails most of the times).

The error snippet is as follows:

{code}
2012-07-12 19:12:29,039 WARN  [Client] Unexpected error reading responses on connection Thread[IPC
Client (1252749669) connection to sjc1-ciq-ibm-grid07.carrieriq.com/10.202.50.187:47944 from
hadoop,5,main]
java.lang.NullPointerException
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)
2012-07-12 19:12:29,044 INFO  [ClientServiceDelegate] Application state is completed. FinalApplicationStatus=SUCCEEDED.
Redirecting to job history server
2012-07-12 19:12:29,132 INFO  [ClientServiceDelegate] Application state is completed. FinalApplicationStatus=SUCCEEDED.
Redirecting to job history server
2012-07-12 19:12:29,216 ERROR [UserGroupInformation] PriviledgedActionException as:hadoop
(auth:SIMPLE) cause:java.io.IOException
2012-07-12 19:12:29,216 WARN  [BaseOutputStageJob] getJobCounters: Unable to retrieve counters.
null
java.io.IOException
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:315)
	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:335)
	at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:470)
	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:719)
	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:716)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:716)
	at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
{code}

 
I'll attach the logs for AM and resource mgr separately, however no unusual activity is seen
in those.

This makes me suspect that we have a race condition in the code trying to access job counters
when AM is finishing up and the job hasn't moved to history server yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message