hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-5138) Support HDFS upgrade in HA
Date Fri, 21 Mar 2014 01:36:55 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942638#comment-13942638
] 

Jing Zhao edited comment on HDFS-5138 at 3/21/14 1:36 AM:
----------------------------------------------------------

So I did a simple test for HDFS upgrade with HA, and hit the following exception while doing
rollback (with layoutversion change in the upgrade):
{code}
14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if roll back possible
for one or more JournalNodes. 1 exceptions thrown:
Unexpected version of storage directory /grid/1/tmp/journal/mycluster. Reported: -56. Expecting
= -55.
	at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:178)
	at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:131)
	at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:228)
	at org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
	at org.apache.hadoop.hdfs.qjournal.server.JNStorage.<init>(JNStorage.java:73)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:142)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:309)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
{code}

In my HA upgrade test, the new software bumped the layoutversion from -55 to -56. I stopped
all the services and restarted JNs with old software. Then I run "namenode -rollback" and
hit the above exception. Looks like for rollback JN with old software cannot handle future
layoutversion brought by new software.


was (Author: jingzhao):
So I did a simple test for HDFS upgrade with HA, and hit the following exception while doing
rollback (with layoutversion change in the upgrade):
{code}
14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if roll back possible
for one or more JournalNodes. 1 exceptions thrown:
Unexpected version of storage directory /grid/1/tmp/journal/mycluster. Reported: -56. Expecting
= -55.
	at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:178)
	at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:131)
	at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:228)
	at org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
	at org.apache.hadoop.hdfs.qjournal.server.JNStorage.<init>(JNStorage.java:73)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:142)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:309)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
{code}

In my HA upgrade test, the new software bumped the layoutversion from -55 to -56. Then I stopped
all the services and restarted JNs with old software. Then I run "namenode -rollback" and
hit the above exception. Looks like for rollback JN with old software cannot handle future
layoutversion brought by new software.

> Support HDFS upgrade in HA
> --------------------------
>
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>             Fix For: 3.0.0
>
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout version
change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x
to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade.

> The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If
HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs,
things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade
mode) and DN's upgrade snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  I am marking
this as a 2.1.1-beta blocker based on feedback from others.  If there is a reasonable workaround
that does not increase maintenance window greatly, we can lower its priority from blocker
to critical.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message