hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1955) HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
Date Fri, 17 Jun 2011 07:35:47 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt Foley updated HDFS-1955:
-----------------------------

    Attachment: hdfs-1955_1.patch

Here is a patch that provides the desired check, failing doUpgrade() if any storage directory
fails.  The change in FSImage is just a few lines, and easily validated by inspection. 

However, providing a unit test for it was very difficult. The problem is that failure must
be forced *within* the doUpgrade() method itself, which is buried in the Namenode startup
code, and quite well protected.  First I tried to make the storage dir read-only, but that
gets caught in recoverTransitionRead() well before invoking doUpgrade().  Second I looked
at using Mockito, but it seems that in order to spy on the startup/upgrade process one would
have to mock the entire stack of HDFS system objects.  The invocation of NNStorage.rename()
at line 367 of FSImage would be a convenient spy target, but it is static and I saw no way
to get hold of it.  Third, I rejected non-mock test parameters in production code.

Finally I just tested it manually by temporarily hacking the code in doUpgrade() to force
the error.  I was able to validate my patch, and also found and fixed an NPE bug in FSEditLog.

> HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
> -----------------------------------------------------
>
>                 Key: HDFS-1955
>                 URL: https://issues.apache.org/jira/browse/HDFS-1955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>         Attachments: hdfs-1955_1.patch
>
>
> Prior to HDFS-1826, doUpgrade() would fail if any of the storage directories failed to
successfully write the new fsimage or edits files.
> Now it appears to "succeed" even if some or all of the individual directories fail.
> There is some discussion about whether doUpgrade() should have some fault tolerance,
but for now make it fail on any single storage directory failure, as before.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message