Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <263868416.1207056925560.JavaMail.jira@brutus>
Date: Tue, 1 Apr 2008 06:35:25 -0700 (PDT)
From: "Devaraj Das (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-3130) Shuffling takes too long to get the
 last map output.
In-Reply-To: <988740632.1206751104225.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584135#action_12584135 ] 

Devaraj Das commented on HADOOP-3130:
-------------------------------------

I think it makes sense from the utilization point of view to have a smaller timeout. We free up a thread sooner and it can potentially successfully fetch from some other host. This needs to be benchmarked. But it also means that we need to keep an eye on the self-healing aspect - we kill reducers after they fail to fetch for a certain number of times (and connection establishment failure is a sign of failure currently). We might end up killing reducers sooner than we do it today. 
[For killing reducers, we probably should move to a model where we look at the global picture and use all information before killing a reducer (move this logic entirely to the JobTracker). So in the case of map output fetch failures the JT can decide whether to kill a reducer or not based on which map outputs the reducer is failing to fetch, and, whether those map nodes are healthy, etc.]

> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
>                 Key: HADOOP-3130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3130
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>         Attachments: HADOOP-3130.patch, shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the reducer backs off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.