hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
Date Fri, 13 Jan 2017 22:42:26 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822475#comment-15822475
] 

Xiaoyu Yao edited comment on HDFS-11209 at 1/13/17 10:42 PM:
-------------------------------------------------------------

Delta from v03: removing the unit test change that can't repro the original rolling upgrade
issue. 

The repro is a bit tricky with MiniDFSCluter as we need to run old version of NN with ""hdfs
dfsadmin -rollingUpgrade prepare"  to create a fsiamge with the old layoutversion. Then do
the upgrade and run the primary namenode(new software layout version) with "-rollingUpgrade
started" option and secondary namenode (new software layout version) as normal. 

The software layout version is determined by static method from LayoutVersion class which
is not supported with mockito. It is possible to do that with powermock + mockito. Decide
to add unit test in a separate ticket. I've manually tested upgrade from Hadoop 2.6 to Hadoop
2.7.1 in a non-HA setup with layout version changing 60->63 and verified that the SNN can
checkpoint with an unfinalized primary NN rollingupgrade.


was (Author: xyao):
Delta from v03: removing the unit test change that can't repro the original rolling upgrade
issue. 

The repro is a bit tricky with MiniDFSCluter as we need to run old version of NN with ""hdfs
dfsadmin -rollingUpgrade prepare"  to create a fsiamge with the old layoutversion. Then do
the upgrade and run the primary namenode(new software layout version) with "-rollingUpgrade
started" option and secondary namenode (new software layout version) as normal. 

The software layout version is determined by static method from LayoutVersion class which
is not supported with mockito. It is possible to do that with powermock + mockito. Decide
to add unit test in a separate ticket. I've manually tested upgrade from Hadoop 2.6 ->
Hadoop 2.7.1 in a non-HA setup with layout version changing 60->63 and verified that the
SNN can checkpoint with an unfinalized primary NN rollingupgrade.

> SNN can't checkpoint when rolling upgrade is not finalized
> ----------------------------------------------------------
>
>                 Key: HDFS-11209
>                 URL: https://issues.apache.org/jira/browse/HDFS-11209
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rolling upgrades
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Critical
>         Attachments: HDFS-11209.00.patch, HDFS-11209.01.patch, HDFS-11209.02.patch, HDFS-11209.03.patch,
HDFS-11209.04.patch
>
>
> Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 brings this
back. 
> With HDFS-8432, the primary NN will not update the VERSION file to the new version after
running with "rollingUpgrade" option until upgrade is finalized. This is to support more downgrade
use cases.
> However, the checkpoint on the SNN is incorrectly updating the VERSION file when the
rollingUpgrade is not finalized yet on the primary NN. As a result, the SNN checkpoint successfully
but fail to push it to the primary NN because its version is higher than the primary NN as
shown below.
> {code}
> 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode (SecondaryNameNode.java:doWork(399))
- Exception in doCheckpoint
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: Image
uploading failed, status: 403, url: http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le...,
message: This namenode has storage info -60:221856466:1444080250181:clusterX but the secondary
expected -63:221856466:1444080250181:clusterX
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message