hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
Date Thu, 20 Feb 2014 23:19:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907687#comment-13907687

Suresh Srinivas commented on HDFS-5840:

[~atm], sorry for the late reply. I had lost track of this.

As for handling the partial upgrade failure as you've described, I'd like to add one more
RPC call to the JournalManager to initiate analysis/recovery of the storage dirs upon first
contact, and then refactor the contents of FSImage#recoverStorageDirs into NNUpgradeUtil just
like was done with the other upgrade-related procedures. If this sounds OK to you, I'll go
ahead and add that stuff and appropriate tests.
Why not always recover in preupgrade/upgrade step, instead of adding another RPC?

With rolling upgrade getting ready, some of the functionality added in that may be useful.
For partial failures related to JournalNodes, the choice made in that feature to make the
operation to rollback JournalNode idempotent. It looks like lot of rolling upgrade related
code can be leveraged here, since upgrade is a special case of rolling upgrade. Should we
explore that?

> Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
> --------------------------------------------------------------------------------
>                 Key: HDFS-5840
>                 URL: https://issues.apache.org/jira/browse/HDFS-5840
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>             Fix For: 3.0.0
>         Attachments: HDFS-5840.patch
> Suresh posted some good comment in HDFS-5138 after that patch had already been committed
to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content
of the review.

This message was sent by Atlassian JIRA

View raw message