hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Jain (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4442) Accessing hadoop counters from a job is unreliable in yarn during in AM process cleanup window
Date Fri, 13 Jul 2012 20:36:33 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rahul Jain updated MAPREDUCE-4442:
----------------------------------

    Attachment: am_logs_counter_failure.html

Attached AM logs for the full job; the timestamps should correlate to the application timestamps.
                
> Accessing hadoop counters from a job is unreliable in yarn during in AM process cleanup
 window
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4442
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4442
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Rahul Jain
>         Attachments: am_logs_counter_failure.html
>
>
> We found this issue during our tests moving from MapReduceV1 to MapReduceV2. A few of
our applications access job counters multiple times:
> a) After submission of job, while job is execution (works fine)
> b) Right after job complete notification is received (works fine)
> c) Few seconds after job complete notification (fails most of the times).
> The error snippet is as follows:
> {code}
> 2012-07-12 19:12:29,039 WARN  [Client] Unexpected error reading responses on connection
Thread[IPC Client (1252749669) connection to sjc1-ciq-ibm-grid07.carrieriq.com/10.202.50.187:47944
from hadoop,5,main]
> java.lang.NullPointerException
> 	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)
> 2012-07-12 19:12:29,044 INFO  [ClientServiceDelegate] Application state is completed.
FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> 2012-07-12 19:12:29,132 INFO  [ClientServiceDelegate] Application state is completed.
FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> 2012-07-12 19:12:29,216 ERROR [UserGroupInformation] PriviledgedActionException as:hadoop
(auth:SIMPLE) cause:java.io.IOException
> 2012-07-12 19:12:29,216 WARN  [BaseOutputStageJob] getJobCounters: Unable to retrieve
counters. null
> java.io.IOException
> 	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:315)
> 	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:335)
> 	at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:470)
> 	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:719)
> 	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:716)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> 	at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:716)
> 	at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
> {code}
> The connection to 10.202.50.187:47944 is actually the connection to AM; appears that
we are connecting to AM to get the counters for the successful job and not the history server.
>  
> I'll attach the logs for AM and resource mgr separately, however no unusual activity
is seen in those.
> This makes me suspect that we have a race condition in the code trying to access job
counters when AM is finishing up and the job hasn't moved to history server yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message