hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gianmarco De Francisci Morales (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2187) map tasks timeout during sorting
Date Mon, 15 Nov 2010 17:30:13 GMT
map tasks timeout during sorting

                 Key: MAPREDUCE-2187
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2187
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.2
            Reporter: Gianmarco De Francisci Morales

During the execution of a large job, the map tasks timeout:

INFO mapred.JobClient: Task Id : attempt_201010290414_60974_m_000057_1, Status : FAILED
Task attempt_201010290414_60974_m_000057_1 failed to report status for 609 seconds. Killing!

The bug is in the fact that the mapper has already finished, and, according to the logs, the
timeout occurs during the merge sort phase.
The intermediate data generated by the map task is quite large. So I think this is the problem.

The logs show that the merge-sort was running for 10 minutes when the task was killed.
I think the mapred.Merger should call Reporter.progress() somewhere.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message