hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1955) HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
Date Wed, 29 Jun 2011 01:20:28 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Matt Foley updated HDFS-1955:

    Attachment: hdfs-1955_2.patch

bq. Should this call getNumEditStreams() [instead of "(editStreams == null)"], or maybe even
better yet isOpen() which returns false if there are no edit streams?

No, the point of this insertion is solely to prevent NPE in the rare case where (as the comment
notes) an error occurs on one or more sd's before editStreams has even been initialized. 
The check for null is efficient and sufficient.

bq. Would it make sense to move the call to reportErrorOnDirectories inside the if? Other
callers of the method tend to not unconditionally call the method.

Agreed.  New patch contains this change.

bq. is there a race condition... Can isAlive return false because the thread already terminated
before waitForThreads is invoked? I ask because won't the thread be left in limbo? In which
case, should the while be a do-while?

We talked, and noted that Java thread join is not the same as pthread join.  There's no race,
nor other issue, because both .isAlive() and .join() can be called on an already-terminated
thread without any exception being thrown.  The only purpose of the loop is to deal with the
possibility that an interruption may be received while this method is blocked on the join()
call.  It doesn't matter whether the termination condition is checked at the beginning or
the end of the loop.  So the existing code is acceptable.

> HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
> -----------------------------------------------------
>                 Key: HDFS-1955
>                 URL: https://issues.apache.org/jira/browse/HDFS-1955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>         Attachments: hdfs-1955_1.patch, hdfs-1955_1.patch, hdfs-1955_2.patch
> Prior to HDFS-1826, doUpgrade() would fail if any of the storage directories failed to
successfully write the new fsimage or edits files.
> Now it appears to "succeed" even if some or all of the individual directories fail.
> There is some discussion about whether doUpgrade() should have some fault tolerance,
but for now make it fail on any single storage directory failure, as before.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message