Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
Message-ID: <19211937.49241276752743663.JavaMail.jira@thor>
Date: Thu, 17 Jun 2010 01:32:23 -0400 (EDT)
From: "Thanh Do (JIRA)" <jira@apache.org>
To: hdfs-dev@hadoop.apache.org
Subject: [jira] Created: (HDFS-1233) Bad retry logic at DFSClient
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Bad retry logic at DFSClient
----------------------------

                 Key: HDFS-1233
                 URL: https://issues.apache.org/jira/browse/HDFS-1233
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs client
    Affects Versions: 0.20.1
            Reporter: Thanh Do


- Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 2nd disk
 
- Setups:
+ # available datanodes = 1
+ # disks / datanode = 2
+ # failures = 1
+ failure type = bad disk
+ When/where failure happens = (see below)
 
- Details:

The setup is:
1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2).
 
We injected a single disk failure to see if we can failover to the
second disk or not.
 
If a persistent disk failure happens during createBlockOutputStream
(the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad),
then createBlockOutputStream (cbos) will get an exception and it
will retry!  When it retries it will get the same DN1 from the namenode,
and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile,
and finally getNextVolume() which a moving volume#.  Thus, on the
second try, the write will be successfully go to the second disk.
So essentially createBlockOutputStream is wrapped in a
do/while(retry && --count >= 0). The first cbos will fail, the second
will be successful in this particular scenario.
 
NOW, say cbos is successful, but the failure is persistent.
Then the "retry" is in a different while loop.
First, hasError is set to true in RP.run (responder packet).
Thus, DataStreamer.run() will go back to the loop:
while(!closed && clientRunning && !lastPacketInBlock).
Now this second iteration of the loop will call
processDatanodeError because hasError has been set to true.
In processDatanodeError (pde), the client sees that this is the only datanode
in the pipeline, and hence it considers that the node is bad! Although actually
only 1 disk is bad!  Hence, pde throws IOException suggesting
all the datanodes (in this case, only DN1) in the pipeline is bad.
Hence, in this error, the exception is thrown to the client.
But if the exception, say, is catched by the most outer while loop
do-while(retry && --count >= 0), then this outer retry will be
successful then (as suggested in the previous paragraph).
 
In summary, if in a deployment scenario, we only have one datanode
that has multiple disks, and one disk goes bad, then the current
retry logic at the DFSClient side is not robust enough to mask the
failure from the client.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
Haryadi Gunawi (haryadi@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.