Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 47252 invoked from network); 22 May 2010 14:59:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 May 2010 14:59:44 -0000 Received: (qmail 60141 invoked by uid 500); 22 May 2010 14:59:44 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 60067 invoked by uid 500); 22 May 2010 14:59:43 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 60055 invoked by uid 99); 22 May 2010 14:59:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 May 2010 14:59:43 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 May 2010 14:59:40 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4MExJrO023994 for ; Sat, 22 May 2010 14:59:19 GMT Message-ID: <3751433.34991274540359047.JavaMail.jira@thor> Date: Sat, 22 May 2010 10:59:19 -0400 (EDT) From: "Joydeep Sen Sarma (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic In-Reply-To: <26464038.18861274292537891.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870303#action_12870303 ] Joydeep Sen Sarma commented on MAPREDUCE-1800: ---------------------------------------------- btw - we don't need distributed failure detection to cover the case of application errors while fetching map outputs. if the map side task tracker encounters enough failures while retrieving map outputs - it can either commit suicide or report this fact directly to the JT (instead of relying on the reducer to do so). in that sense - such application errors are no different from errors while trying to execute map/reduce tasks. it seems that the only non-trivial cases that the reducer needs to report about are network error cases - that are inherently symmetric in nature. the onus then shifts to the JT to infer which party is to blame (if any) by looking at the collective set of errors being reported in the system. > using map output fetch failures to blacklist nodes is problematic > ----------------------------------------------------------------- > > Key: MAPREDUCE-1800 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Joydeep Sen Sarma > > If a mapper and a reducer cannot communicate, then either party could be at fault. The current hadoop protocol allows reducers to declare nodes running the mapper as being at fault. When sufficient number of reducers do so - then the map node can be blacklisted. > In cases where networking problems cause substantial degradation in communication across sets of nodes - then large number of nodes can become blacklisted as a result of this protocol. The blacklisting is often wrong (reducers on the smaller side of the network partition can collectively cause nodes on the larger network partitioned to be blacklisted) and counterproductive (rerunning maps puts further load on the (already) maxed out network links). > We should revisit how we can better identify nodes with genuine network problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.