hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fengdong Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4967) Generate block ID sequentially cannot work with QJM HA
Date Tue, 09 Jul 2013 07:43:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703024#comment-13703024

Fengdong Yu commented on HDFS-4967:

Hi Arpit,

I am already have a HA cluster(LayoutVersion=-43), and I don't want to upgrade, because upgrade
under HA is complicated. 

AFAIK, upgrade under HA cluster including:
1) stop-dfs
2) replace core-site.xml hdfs-site.xml with Non-HA xmls.
3) merge edit logs from standby NN to active NN
4) start-dfs  -upgrade
5) stop-dfs
6) revert core-site.xml and hdfs-site.xml(HA)
7) initializeSharedEdits
8) start-dfs

So, I did some hack in the trunk codes, like following:
         "Reduce snapshot inode memory footprint", false),
-    SEQUENTIAL_BLOCK_ID(-46, "Allocate block IDs sequentially and store " +
-        "block IDs in the edits log and image files");
+    SEQUENTIAL_BLOCK_ID(-43, -43, "Allocate block IDs sequentially and store " +
+        "block IDs in the edits log and image files", false);

after that, I saw the above errors.

but now, I revert LayoutVersion.java, and upgraded cluster(eight steps I mentions at first)

It works, no Exceptions.

So we can close this issue with "not a problem". but I want to know:

Am I right for these eight steps during upgrade?(from one HA verson to another HA version,
such as from LayoutVersion -43 to -46)

> Generate block ID sequentially cannot work with QJM HA
> ------------------------------------------------------
>                 Key: HDFS-4967
>                 URL: https://issues.apache.org/jira/browse/HDFS-4967
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, hdfs-client, namenode
>    Affects Versions: 3.0.0
>            Reporter: Fengdong Yu
>            Assignee: Arpit Agarwal
> There are two name nodes, one is active, another acts as standby name node. QJM Ha  configured.
> After HDFS-4645 committed in the trunk, then the following error showed during name node
> {code}
> 2013-07-09 11:28:45,394 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception
in namenode join
> java.lang.IllegalStateException: Cannot skip to less than the current value (=1073741824),
where newValue=0
>         at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setLastAllocatedBlockId(FSNamesystem.java:5124)
>         at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:278)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:809)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:798)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:653)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:623)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:260)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:719)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:552)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:401)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:435)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:607)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:592)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1172)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1238)
> 2013-07-09 11:28:45,397 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message