hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
Date Sat, 25 Jan 2014 22:57:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882096#comment-13882096
] 

Suresh Srinivas commented on HDFS-5138:
---------------------------------------

@Todd, I have had some conversation about this [~atm] related to this jira. I had brought
up one issue about potentially losing editlogs on JournalNode. I thought that would be addressed
before this jira can be committed. I have been very busy and have not been able to provide
all my comments. Reviewing this patch has been quite tricky. Here are my almost complete review
comments. While some of the issues are minor nits, I do not think this patch and the documentation
is ready.

I am adding information about the design, the way I understand it. Let me know if I got it
wrong.
*Upgrade preparation:*
# New bits are installed on the cluster nodes.
# The cluster is brought down.

*Upgrade:* For HA setup, choose one of the namenodes to initiate upgrade on and start it with
-upgrade flag.
# NN performs preupgrade for all non shared storage directories by moving current to previous.tmp
and creating new current.
#* Failure here is fine. NN start up fails. Next attempt at upgrade the storage directories
are recovered.
# NN performs preupgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes current
moved to previous.tmp and new current is created.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog directory could be
lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of non shared edits by writing new CTIME to current and moving previous.tmp
to previous.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog directory could be
lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes current has
new CTIM and previous.tmp is moved to previous.
# We need to document that all the JournalNodes must be up. If a JN is irrecoverably lost,
configuration must be changed to exclude the JN.

*Rollback:* NN is started with rollback flag
# For all the non shared directories, the NN checks for canRollBack, essentially ensures that
previous directory with the right layout version exists.
# For all the shared directories, the NN checks for canRollBack, essentially ensures that
previous directory with the right layout version exists.
# NN performs rollback for shared directories (moving previous to current)
#* If rollback of one of the JN fails, then directories are in inconsistent state. I think
any attempt at retrying rollback will fail and will require manually moving files around.
I do not think restarting JN fixes this.
# We need to document that all the JournalNodes must be up. If a JN is irrecoverably lost,
configuration must be changed to exclude the JN.

*Finalize:* DFSAdmin command is run to finalize the upgrade.
# Active NN performs finalizing of editlog. If JN's fail to finalize, active NN fails to finalize.
However it is possible that standby finalizes, leaving the cluster in an inconsistent state.
# We need to document that all the JournalNodes must be up. If a JN is irrecoverably lost,
configuration must be changed to exclude the JN.

Comments on the code in the patch (this is almost complete):
Comments:
# Minor nit: there are some white space changes
# assertAllResultsEqual - for loop can just start with i = 1? Also if the collection objects
is of size zero or one, the method can return early. Is there a need to do object.toArray()
for these early checks? With that, perhaps the findbugs exclude may not be necessary.
# Unit test can be added for methods isAtLeastOneActive, getRpcAddressesForNameserviceId and
getProxiesForAllNameNodesInNameservice (I am okay if this is done in a separate jira)
# Finalizing upgrade is quite tricky. Consider the following scenarios:
#* One NN is active and the other is standby - works fine
#* One NN is active and the other is down or all NNs - finalize command throws exception and
the user will not know if it has succeeded or failed and what to do next
#* No active NN - throws an exception cannot finalize with no active
#* BlockPoolSliceStorage.java change seems unnecessary
# Why is {{throw new AssertionError("Unreachable code.");}} in QuorumJournalManager.java methods?
# FSImage#doRollBack() - when canRollBack is false after checking if non-share directories
can rollback, an exception must be immediately thrown, instead of checking shared editlog.
Also printing Log.info when storages can be rolled back will help in debugging.
# FSEditlog#canRollBackSharedLog should accept StorageInfo instead of Storage
# QuorumJournalManager#canRollBack and getJournalCTime can throw AssertionError (from DFSUtil.assertAllResultsEqual()).
Is that the right exception to expose or IOException?
# Namenode startup throws AssertionError with -rollback option. I think we should throw IOException,
which is how all the other failures are indicated.


> Support HDFS upgrade in HA
> --------------------------
>
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout version
change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x
to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade.

> The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If
HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs,
things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade
mode) and DN's upgrade snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  I am marking
this as a 2.1.1-beta blocker based on feedback from others.  If there is a reasonable workaround
that does not increase maintenance window greatly, we can lower its priority from blocker
to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message