Return-Path: Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: (qmail 98671 invoked from network); 8 Sep 2010 02:02:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Sep 2010 02:02:14 -0000 Received: (qmail 28132 invoked by uid 500); 8 Sep 2010 02:02:14 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 27883 invoked by uid 500); 8 Sep 2010 02:02:13 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 27875 invoked by uid 99); 8 Sep 2010 02:02:13 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Sep 2010 02:02:13 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Sep 2010 02:01:56 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o8821Yp8026175 for ; Wed, 8 Sep 2010 02:01:34 GMT Message-ID: <1585775.68461283911294658.JavaMail.jira@thor> Date: Tue, 7 Sep 2010 22:01:34 -0400 (EDT) From: "Thanh Do (JIRA)" To: hdfs-dev@hadoop.apache.org Subject: [jira] Created: (HDFS-1384) NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack. --------------------------------------------------------------------------------------------------------------------------------------- Key: HDFS-1384 URL: https://issues.apache.org/jira/browse/HDFS-1384 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.1 Reporter: Thanh Do We saw a case that NN keeps giving client nodes from the same rack, hence an exception from client when try to setup the pipeline. Client retries 5 times and fails. Here is more details. Support we have 2 rack - Rack 0: from dn1 to dn7 - Rack 1: from dn8 to dn14 Client asks for 3 dns and NN replies with dn1, dn8 and dn9, for example. Because there is network partition, so client doesn't see any node in Rack 0. Hence, client add dn1 to excludedNodes list, and ask NN again. Interestingly, NN picks a different node (from those in excludedNodes) in Rack 0, and gives back to client, and so on. Client keeps retrying and after 5 times of retrials, write fails. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and Haryadi Gunawi (haryadi@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.