hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7185) The active NameNode will not accept an fsimage sent from the standby during rolling upgrade
Date Tue, 14 Oct 2014 03:09:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170442#comment-14170442
] 

Tsz Wo Nicholas Sze commented on HDFS-7185:
-------------------------------------------

- I think writing lv in FSImage.recoverStorageDirs(..) may be too early since it is checking
each storage directory and some of them may fail.  Then we may end up with some directories
updated but not all of them.  How about we change the version check and defer writing lv until
the code below?
{code}
+        if (fsImage.getLayoutVersion() != HdfsConstants.NAMENODE_LAYOUT_VERSION
+            && StartupOption.ROLLINGUPGRADE == startOpt) {
+          fsImage.updateStorageVersion();
+        }
{code}

- I think the above code should be more strict:
-* for downgrade, we must have the same versions; otherwise, throw an exception.
-* for started, update lv only if current version is newer than the on-disk version.  If current
version is older than the on-disk version, throw an exception.
-* for rollback, update lv only if current version is older than the on-disk version.  If
current version is newer than the on-disk version, throw an exception.

> The active NameNode will not accept an fsimage sent from the standby during rolling upgrade
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7185
>                 URL: https://issues.apache.org/jira/browse/HDFS-7185
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Jing Zhao
>         Attachments: HDFS-7185.000.patch, HDFS-7185.001.patch
>
>
> The active NameNode will not accept an fsimage sent from the standby during rolling upgrade.
 The active fails with the exception:
> {code}
> 18:25:07,620  WARN ImageServlet:198 - Received an invalid request file transfer request
from a secondary with storage info -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> 18:25:07,620  WARN log:76 - Committed before 410 PutImage failed. java.io.IOException:
This namenode has storage info -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but
the secondary expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
> 0a6e431987f6
>         at org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
>         at org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
> {code}
> On the standby, the exception is:
> {code}
> java.io.IOException: Exception during image upload: org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
This namenode has storage info -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but
the secondary expected
>  -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
>         at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
>         at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
> {code}
> This seems to be a consequence of the fact that the VERSION file still is at -55 (the
old version) even after the rolling upgrade has started.  When the rolling upgrade is finalized
with {{hdfs dfsadmin -rollingUpgrade finalize}}, both VERSION files get set to the new version,
and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message