Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 12856 invoked from network); 31 Mar 2008 06:42:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Mar 2008 06:42:46 -0000 Received: (qmail 85995 invoked by uid 500); 31 Mar 2008 06:42:45 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 85960 invoked by uid 500); 31 Mar 2008 06:42:45 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 85950 invoked by uid 99); 31 Mar 2008 06:42:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Mar 2008 23:42:45 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2008 06:42:12 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AD1C9234C0AF for ; Sun, 30 Mar 2008 23:40:24 -0700 (PDT) Message-ID: <551846859.1206945624707.JavaMail.jira@brutus> Date: Sun, 30 Mar 2008 23:40:24 -0700 (PDT) From: "Amar Kamat (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3130) Shuffling takes too long to get the last map output. In-Reply-To: <988740632.1206751104225.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583568#action_12583568 ] Amar Kamat commented on HADOOP-3130: ------------------------------------ It seems that the log info is the main cause of confusion. This is what we think has happened as per the logs 1) The reducer gets the task completion event for a bunch of maps and schedules them. 2) All the map outputs get successfully copied except one. 3) Assume that the jetty that was supposed to serve the remaining map's output is busy. 4) After 3 mins the attempt fails, gets retried and succeeds. 3min is the timeout for a fetch attempt. This also explains the 2 min wait mentioned above. In the first 1 min other map outputs are fetched (i.e overlapped). In the remaining 2 mins (before timeout) the reducer is just waiting for the last map's output. The '*need 1 map output*' info in the reducers logs should also mention how many of them are in progress. > Shuffling takes too long to get the last map output. > ---------------------------------------------------- > > Key: HADOOP-3130 > URL: https://issues.apache.org/jira/browse/HADOOP-3130 > Project: Hadoop Core > Issue Type: Bug > Reporter: Runping Qi > Attachments: shuffling.log > > > I noticed that towards the end of shufflling, the map output fetcher of the reducer backs off too aggressively. > I attach a fraction of one reduce log of my job. > Noticed that the last map output was not fetched in 2 minutes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.