hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA
Date Mon, 13 Jan 2014 23:34:01 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron T. Myers updated HDFS-5138:

    Attachment: HDFS-5138.patch

+ // This is expected to happen for a stanby NN.
Typo (standby)

Thanks, fixed.

+ // Either they all return the same thing or this call fails, so we can
+ // just return the first result.
Would be good to assert that - eg in case one of the JNs crashed in the middle of a previously
attempted upgrade sequence.

Sure, done.

* @param useLock true - enables locking on the storage directory and false
* disables locking
+ * @param isShared whether or not this dir is shared between two NNs. true
+ * enables locking on the storage directory, false disables locking
I think this doc is now wrong because you inverted the sense of these booleans - we don't
lock the shared dir.

Good catch. Fixed.

+ public synchronized void doFinalizeOfSharedLog() throws IOException {
+ public synchronized boolean canRollBackSharedLog(Storage prevStorage,
Style nit: extra space in the above two methods


+ if (!sd.isShared()) {
+ // This will be done on transition to active.
Worth a LOG.info or even warn here

Added the following:

LOG.info("Not doing recovery on " + sd + " now. Will be done on "
                + "transition to active.");

bq. Currently it seems like whichever SBN starts up first has to be the one who does the transition
to active. Maybe a follow-up JIRA could be to relax that constraint? Seems like it should
be fine for either one of the NNs to actually do the upgrade - the lock file is just to make
sure they agree on the target ctime.

Agree this seems like a good idea, but agree it can reasonably be done in a follow-up JIRA.
If you agree, I'll file it when we commit this one.

+ dfsadmin -finalizeUpgrade'>>> command while the NNs are running and one of them
+ is active. The active NN at the time this happens will perform the upgrade of
+ the shared log, and both of the NNs will finalize the upgrade in their local
I think here you mean the "finalization of the shared log"

Sure did. Fixed.

> Support HDFS upgrade in HA
> --------------------------
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout version
change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x
to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade.

> The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If
HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs,
things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade
mode) and DN's upgrade snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  I am marking
this as a 2.1.1-beta blocker based on feedback from others.  If there is a reasonable workaround
that does not increase maintenance window greatly, we can lower its priority from blocker
to critical.

This message was sent by Atlassian JIRA

View raw message