hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-969) NullPointerException during reduce freezes job
Date Thu, 17 Sep 2009 00:31:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756299#action_12756299
] 

Todd Lipcon commented on MAPREDUCE-969:
---------------------------------------

Hi Jothi,

This occurred again on the same cluster - same symptoms/traces/etc, so not uploading anything
new. I looked through the tasktracker logs and was unable to find anything suspicious. The
patch in HADOOP-4744 (r772846) is not in 0.20.0, so we don't have the port checks or info
printouts.

It does appear to be very similar to HADOOP, though - the TT with the -1 port (xx28 in this
case) ran several tasks before it was eventually shut down by ops. All of the jobs that had
map tasks run on xx28 eventually went into this state.

We'll apply the second patch from HADOOP-4744 on this cluster and report back whether it solves
the problem. Feeling pretty good that it will.

> NullPointerException during reduce freezes job
> ----------------------------------------------
>
>                 Key: MAPREDUCE-969
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.20.2
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck reduce tasks
had a similar were stuck at "Need another 2 map output(s) where 0 is already in progress"
despite all of the mappers having completed, and 0 scheduled. The stuck reducers had experienced
the following exception early in the shuffle:
> java.lang.NullPointerException
> 	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message