Return-Path: Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: (qmail 85950 invoked from network); 17 Jun 2010 12:39:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Jun 2010 12:39:31 -0000 Received: (qmail 61884 invoked by uid 500); 17 Jun 2010 05:32:51 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 61665 invoked by uid 500); 17 Jun 2010 05:32:49 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 61649 invoked by uid 99); 17 Jun 2010 05:32:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jun 2010 05:32:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jun 2010 05:32:45 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o5H5WNnO004478 for ; Thu, 17 Jun 2010 05:32:23 GMT Message-ID: <19211937.49241276752743663.JavaMail.jira@thor> Date: Thu, 17 Jun 2010 01:32:23 -0400 (EDT) From: "Thanh Do (JIRA)" To: hdfs-dev@hadoop.apache.org Subject: [jira] Created: (HDFS-1233) Bad retry logic at DFSClient MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Bad retry logic at DFSClient ---------------------------- Key: HDFS-1233 URL: https://issues.apache.org/jira/browse/HDFS-1233 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 2nd disk - Setups: + # available datanodes = 1 + # disks / datanode = 2 + # failures = 1 + failure type = bad disk + When/where failure happens = (see below) - Details: The setup is: 1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2). We injected a single disk failure to see if we can failover to the second disk or not. If a persistent disk failure happens during createBlockOutputStream (the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad), then createBlockOutputStream (cbos) will get an exception and it will retry! When it retries it will get the same DN1 from the namenode, and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile, and finally getNextVolume() which a moving volume#. Thus, on the second try, the write will be successfully go to the second disk. So essentially createBlockOutputStream is wrapped in a do/while(retry && --count >= 0). The first cbos will fail, the second will be successful in this particular scenario. NOW, say cbos is successful, but the failure is persistent. Then the "retry" is in a different while loop. First, hasError is set to true in RP.run (responder packet). Thus, DataStreamer.run() will go back to the loop: while(!closed && clientRunning && !lastPacketInBlock). Now this second iteration of the loop will call processDatanodeError because hasError has been set to true. In processDatanodeError (pde), the client sees that this is the only datanode in the pipeline, and hence it considers that the node is bad! Although actually only 1 disk is bad! Hence, pde throws IOException suggesting all the datanodes (in this case, only DN1) in the pipeline is bad. Hence, in this error, the exception is thrown to the client. But if the exception, say, is catched by the most outer while loop do-while(retry && --count >= 0), then this outer retry will be successful then (as suggested in the previous paragraph). In summary, if in a deployment scenario, we only have one datanode that has multiple disks, and one disk goes bad, then the current retry logic at the DFSClient side is not robust enough to mask the failure from the client. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and Haryadi Gunawi (haryadi@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.