hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
Date Wed, 20 Nov 2013 16:21:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827810#comment-13827810
] 

Kihwal Lee commented on HDFS-5526:
----------------------------------

I think we need to clarify the role of the VERSION files.  I think layout version and ctime
checks are also in the block pool slice level, which is represented by {{BlockPoolSliceStorage}}.
 So, I think not all fields in the VERSION file written by {{DataStorage}} - the volume-level
storage - is useful.  

VERSION properties in {{DataStorage}} : <volume>/current/VERSION
 - layoutVersion 
 - storageType 
 - namespaceID  
 - clusterID
 - cTime
 - storageID

VERSION properties in {{BlockPoolSliceStorage}} : <volume>/current/<blockpool>/current/VERSION
 - layoutVersion
 - namespaceID
 - blockpoolID
 - cTime

For {{DataStorage}}, the critical information to maintain during post-federation upgrade/rollback
are
 - namespaceID (not used post federation)
 - storageID
 - storageType - automatically set
 - clusterID

{{cTime}} at {{DataStorage}} level doesn't seem to make sense. It will be compared against
the one in {{nsInfo}} from the name node for the first block pool that is initialized. If
the initialization order changes, {{DataStorage}} may fail to initialize. I don't know whether
it's by design or not, but as you (Vinay) said, {{DataStorage.upgrade}} will always run during
DN startup after the first upgrade. This prevents the initialization failure during normal
start-up and upgrade. Rollback is different since it doesn't go through this code path. 

Since upgrade in {{DataStorage}} level does not involve actual data, {{cTime}} change within
the same layout version has no meaning and it shouldn't make any changes. I propose removing
cTime check for post-federation upgrades. Validation should be only performed against {{clusterID}}
and {{layoutVersion}}. Upgrade action (post-federation, not 1.x to 2.x upgrades) be taken
only when the layout version is changed.

For post-federation rollback, it should only update {layoutVersion}. There is no need to save
the old layout version. The layout version is invalid (i.e. talked to a NN running wrong version),
the rollback at {{BlockPoolSliceStorage}} will fail. If the error is corrected and the datanode
is restarted with "-rollback", it should not be stuck in an invalid state. It means, the rollback
at {{DataStorage}} level should accept whatever the first name node is saying. Again, this
is safe since correctness is checked in  {{BlockPoolSliceStorage}} level.

I will submit an update patch soon.

> Datanode cannot roll back to previous layout version
> ----------------------------------------------------
>
>                 Key: HDFS-5526
>                 URL: https://issues.apache.org/jira/browse/HDFS-5526
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes cannot start
with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message