hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
Date Mon, 09 Apr 2012 20:31:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250152#comment-13250152
] 

Todd Lipcon commented on HDFS-2983:
-----------------------------------

I did a little investigation to try to answer Konstantin's questions above.

First, I'll summarize our current behavior, verified on 0.23.1 release (I didn't understand
this thoroughly before trying it out):

- In a running cluster, if you restart the NN without the {{-upgrade}} flag, then the DataNodes
will happily re-register without exiting.
- If you restart the NN with {{-upgrade}}, then when the DN next heartbeats, it will fail
the {{verifyRequest()}} check, since the registration ID's namespace fields no longer match
(the ctime has been incremented by the upgrade). This causes the DataNode to exit.
- Of course, restarting the DN at this point makes it take the snapshot and participate in
the upgrade as expected.

So, to try to respond to Konstantin's questions, here are a couple example scenarios:

*Scenario 1*: rolling upgrade without doing a "snapshot" upgrade (for emergency bug fixes,
hot fixes, MR fixes, other fixes which we don't expect to affect data reliability):

- Leave the NN running, on the old version.
- On each DN, in succession: (1) shutdown DN, (2) upgrade software to the new version, (3)
start DN

The above is sufficient if the changes are scoped only to DNs. If the change also affects
the NN, then you will need to add the following step, either at the beginning or end of the
process:

- shutdown NN. upgrade installed software. start NN on new version

In the case of an HA setup, we can do the NN upgrade without downtime:

- shutdown SBN. upgrade SBN software. start SBN.
- failover to SBN running new version.
- Shutdown previous active. Upgrade software. Start previous active
- Optionally fail back

*Scenario 2*: upgrade to a version with a new layout version (LV)

In this case, a "snapshot" style upgrade is required -- the NN will not restart without the
"-upgrade" flag, and a DN will not connect to a NN with a different LV. So the scenario is
the same as today:

- Shutdown entire cluster
- Upgrade all software in teh clsuter
- Start cluster with {{-upgrade}} flag
-- any nodes that missed the software upgrade will fail to connect, since their LV does not
match  (this patch retains that behavior)

*Scenario 3*: upgrade to a version with same layout version, but some data risk (for example
upgrading to a version with bug fixes pertaining to replication policies, corrupt block detection,
etc)

In this scenario, the NN does not mandate a {{-upgrade}} flag, but as Sanjay mentioned above,
it can still be useful for data protection. As with today, if the user does not want the extra
protection, this scenario can be treated identically to scenario 1. If the user does want
the protection, it can be treated identically to scenario 2. Scenario 2 remains safe because
of the check against the NameNode's {{ctime}} matching the DN's {{ctime}}. As soon as you
restart the NN with the {{-upgrade}} flag, all running DNs will exit. Any newly started DN
will noticethe new namespace ctime and take part in the snapshot upgrade.



Does the above description address your concerns? Another idea would be to add a new configuration
option like {{dfs.allow.rolling.upgrades}} which enables the new behavior, so an admin who
prefers not to use the feature can disallow it completely.

                
> Relax the build version check to permit rolling upgrades within a release
> -------------------------------------------------------------------------
>
>                 Key: HDFS-2983
>                 URL: https://issues.apache.org/jira/browse/HDFS-2983
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Eli Collins
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch,
HDFS-2983.patch, HDFS-2983.patch
>
>
> Currently the version check for DN/NN communication is strict (it checks the exact svn
revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents
rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to
branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps
it takes another 23 minor release or so before we're ready to commit to making the minor versions
compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message