hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error
Date Mon, 19 Mar 2012 06:37:44 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232479#comment-13232479
] 

Arun C Murthy commented on MAPREDUCE-4030:
------------------------------------------

Nishan - can u pls check reducer log and AM log to verify if the AM was notified of the map
and it was re-run? That should happen before the reduce bailed out. Thanks.
                
> If the nodemanager on which the maptask is executed is going down before the mapoutput
is consumed by the reducer,then the job is failing with shuffle error
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4030
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Nishan Shetty
>
> My cluster has 2 NM's.
> The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
> When the job execution is in progress and Mappers has finished about 99% completion,one
of the NM has gone down.
> The job has failed with the following trace
> "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle
in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: java.io.IOException:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152)
"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message