hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2136) 1073: Fault injection for StorageDirectory failures during read/write of FSImage/Edits files
Date Thu, 07 Jul 2011 17:11:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061443#comment-13061443

Matt Foley commented on HDFS-2136:

I'm not at all expert on this stuff, but here are my thoughts.  If it were sufficient to do
the usual mock/spy thing, after creation of the relevant object stack, I would want to do
#3.  Simplest implementation would be pairs of protected methods that would fetch a critical
object (perhaps the SD itself) and set (replace it with) a mocked or spy'ed version of it.

However, many of the most interesting cases to test are during startup, when the interesting
objects are still being created on the fly.  It may be that AspectJ is the right way to handle
FI at this point, but I haven't used AspectJ.  Another way to do it would be passing a callback
class through the conf (yuck! - but it would work).  Such a callback, if non-null, could be
called at various key points in the read and write methods, and achieve "in vivo" FI.  I do
suspect AspectJ would do this well, so I'm doing some reading.  What do you think? Does this
fit within your understanding of what the AOP FI framework can do?

By #2, do you mean "ex vivo" calls that would run a fragment of code, out of context, but
with FI?  That would certainly be better than nothing, but would not give me as much confidence
as #1 or #3 that the system would correctly handle a fault during startup.

> 1073: Fault injection for StorageDirectory failures during read/write of FSImage/Edits
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-2136
>                 URL: https://issues.apache.org/jira/browse/HDFS-2136
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Matt Foley
> Both HDFS-1955 and HDFS-2135 have observed that it is difficult to unit test such failures.
 As a result, regression of HDFS-1955 was only found by careful manual review (thanks, atm!).
 Since 1073 is making broad changes to the way these files are read and written, and appropriately
putting effort into correct error handling, I propose we make also make it possible to auto-test
that error handling.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message