hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thanh Do (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1238) A block is stuck in ongoingRecovery due to exception not propagated
Date Thu, 17 Jun 2010 05:58:23 GMT
A block is stuck in ongoingRecovery due to exception not propagated 

                 Key: HDFS-1238
                 URL: https://issues.apache.org/jira/browse/HDFS-1238
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs client
    Affects Versions: 0.20.1
            Reporter: Thanh Do

- Setup:
+  # datanodes = 2
+ replication factor = 2
+ failure type = transient (i.e. a java I/O call that throws I/O Exception or returns false)
+ # failures = 2
+ When/where failures happen: (This is a subtle bug) The first failure is a transient failure
at a datanode during the second phase. Due to the first failure, the DFSClient will call recoverBlock.
 The second failure is injected during this recover block process (i.e. another failure during
the recovery process).
- Details:
The expectation here is that since the DFSClient performs lots of retries,
two transient failures should be masked properly by the retries.
We found one case, where the failures are not transparent to the users.
Here's the stack trace of when/where the two failures happen (please ignore the line number).
1. The first failure:
Exception is thrown at
    call(void java.io.DataOutputStream.flush())
    SourceLoc: org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java(252)
Stack Trace:
      [0] datanode.BlockReceiver (flush:252)
      [1] datanode.BlockReceiver (receivePacket:660)
      [2] datanode.BlockReceiver (receiveBlock:743)
      [3] datanode.DataXceiver (writeBlock:468)
      [4] datanode.DataXceiver (run:119)
2. The second failure:
False is returned at
   call(boolean java.io.File.renameTo(File))
   SourceLoc: org/apache/hadoop/hdfs/server/datanode/FSDataset.java(105)
Stack Trace:
      [0] datanode.FSDataset (tryUpdateBlock:1008)
      [1] datanode.FSDataset (updateBlock:859)
      [2] datanode.DataNode (updateBlock:1780)
      [3] datanode.DataNode (syncBlock:2032)
      [4] datanode.DataNode (recoverBlock:1962)
      [5] datanode.DataNode (recoverBlock:2101)
This is what we found out:
The first failure causes the DFSClient to somehow calls recoverBlock,
which will force us to see the 2nd failure. The 2nd failure makes
renameTo returns false, which then causes an IOException to be thrown
from the function that calls renameTo.
But this IOException is not propagated properly!
It is dropped inside DN.syncBlock(). Specifically DN.syncBlock
calls DN.updateBlock() which gets the exception. But syncBlock
only catches that and prints a warning without propagating the exception
properly.  Thus syncBlock returns without any exception,
and thus recoverBlock returns without executing the finally{} block
(see below).
Now, the client will retry recoverBlock for 3-5 more times,
but this retries always see exceptions! The reason is that the first
time we call recoverBlock(blk), this blk is being put into
an ongoingRecovery list inside DN.recoverBlock().
Normally, blk is only removed (ongoingRecovery.remove(blk)) inside the finally{} block.
But since the exception is not propagated properly, this finally{}
block is never called, thus the blk is stuck
forever inside the ongoingRecovery list, and hence the next time
client performs the retry, it gets this error message
"Block ... is already being recovered" and recoverBlock() throws
IOException.  As a result, the client which calls this whole
process in the context of processDatanodeError will return
from the pde function with closed = true, and hence it never
retries the whole thing again from the beginning, and instead
just returns error.

This bug was found by our Failure Testing Service framework:
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
Haryadi Gunawi (haryadi@eecs.berkeley.edu)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message