hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nishan Shetty (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error
Date Tue, 20 Mar 2012 04:45:53 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233201#comment-13233201
] 

Nishan Shetty commented on MAPREDUCE-4030:
------------------------------------------

I dont't see any log which is notifying to the AM about map output copy fail in the reducer
and relaunching the map task by the AM in AM log
                
> If the nodemanager on which the maptask is executed is going down before the mapoutput
is consumed by the reducer,then the job is failing with shuffle error
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4030
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Nishan Shetty
>
> My cluster has 2 NM's.
> The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
> When the job execution is in progress and Mappers has finished about 99% completion,one
of the NM has gone down.
> The job has failed with the following trace
> "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle
in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: java.io.IOException:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152)
"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message