Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 90D5C1057C for ; Wed, 15 Jan 2014 22:02:56 +0000 (UTC) Received: (qmail 59302 invoked by uid 500); 15 Jan 2014 22:02:49 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 59196 invoked by uid 500); 15 Jan 2014 22:02:47 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 59014 invoked by uid 99); 15 Jan 2014 22:02:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 22:02:40 +0000 Date: Wed, 15 Jan 2014 22:02:40 +0000 (UTC) From: "Suresh Srinivas (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872657#comment-13872657 ] Suresh Srinivas commented on HDFS-5138: --------------------------------------- [~atm], I am looking at this patch. As I see this, I feel this change should include design details. Some questions that come to mind: # In documentation you say " [[2]] Both NNs must be started with the <<<'-upgrade'>>> flag." Does this mean both the namenodes should be available during upgrade or does it just mean that namenodes must be started with -upgrade. One of the namenode can first upgrade (and possibly be finalized) and later second NN can be upgraded? # When active namenode is performing shared edits upgrade, if it fails, does fail over occur to the standby and does the new active resume the upgrade? Same question for finalize and rollback. # In documentation "The operator should run the roll back command on one of the NN boxes,...", could have issues related to which NN is chosen. It must be on the one where upgrade has been previously done right? # Given the rollback procedure, where bootstrapStandby muste be done on one of the NNs, why not just upgrade a single namenode (without worrying about two namenodes racing to upgrade etc.) and just follow the same procedure as rollback to simplify this? > Support HDFS upgrade in HA > -------------------------- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.1.1-beta > Reporter: Kihwal Lee > Assignee: Aaron T. Myers > Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)