Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <551846859.1206945624707.JavaMail.jira@brutus>
Date: Sun, 30 Mar 2008 23:40:24 -0700 (PDT)
From: "Amar Kamat (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-3130) Shuffling takes too long to get the
 last map output.
In-Reply-To: <988740632.1206751104225.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583568#action_12583568 ] 

Amar Kamat commented on HADOOP-3130:
------------------------------------

It seems that the log info is the main cause of confusion. This is what we think has happened as per the logs
1) The reducer gets the task completion event for a bunch of maps and schedules them.
2) All the map outputs get successfully copied except one.
3) Assume that the jetty that was supposed to serve the remaining map's output is busy.
4) After 3 mins the attempt fails, gets retried and succeeds. 3min is the timeout for a fetch attempt.
This also explains the 2 min wait mentioned above. In the first 1 min other map outputs are fetched (i.e overlapped). In the remaining 2 mins (before timeout) the reducer is just waiting for the last map's output. The '*need 1 map output*' info in the reducers logs should also mention how many of them are in progress.

> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
>                 Key: HADOOP-3130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3130
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>         Attachments: shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the reducer backs off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.