Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57B484571 for ; Wed, 15 Jun 2011 04:44:11 +0000 (UTC) Received: (qmail 21671 invoked by uid 500); 15 Jun 2011 04:44:11 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 21610 invoked by uid 500); 15 Jun 2011 04:44:10 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 21600 invoked by uid 99); 15 Jun 2011 04:44:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 04:44:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 04:44:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5C577417E30 for ; Wed, 15 Jun 2011 04:43:47 +0000 (UTC) Date: Wed, 15 Jun 2011 04:43:47 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1157036243.5826.1308113027375.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <534699613.64473.1307054867450.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-2026) 1073: 2NN needs to handle case of reformatted NN better MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2026: ------------------------------ Attachment: hdfs-2026.txt Here's a patch which does the following: - when 2NN talks to NN or NN talks to 2NN, it passes a ':'-joined version of its StorageInfo (ie namespaceid, clusterid, etc). If the other side has a different namespace, it throws an Exception and refuses to process the request - on startup, the 2NN now reads the storage info from its storage directories - on a fresh 2NN, it will have no info (and thus namespaceId == 0) -- in this case it will copy its storage info from the NN the first time it calls rollEdits and gets a CheckpointSignature. All other times, it verifies the CheckpointSignature matches the 2NN's storage info. - I removed the defunct "token" parameter from GetImageServlet since it wasn't really being used anymore. - No longer need to validate the transaction ID of the uploaded checkpoint, since it's OK to upload an out-of-date image. It'll just get removed the next time the archiver runs. > 1073: 2NN needs to handle case of reformatted NN better > ------------------------------------------------------- > > Key: HDFS-2026 > URL: https://issues.apache.org/jira/browse/HDFS-2026 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node > Affects Versions: Edit log branch (HDFS-1073) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Critical > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2026.txt > > > Currently in the 1073 branch, the following steps ends up with a very confused 2NN: > - format NN, run NN > - start 2NN, perform some checkpoints > - reformat NN, start NN on new namespace > - restart same 2NN > The 2NN currently saves the new VERSION info into its local storage directory but doesn't clear out the old checkpoint or edits files. This is obviously wrong and might lead to a corrupt checkpoint getting uploaded. > If the 2NN has storage directories with VERSION info, and connects to an NN with different VERSION info, there are two alternatives: > a) refuse to perform any checkpoints until the operator issues a "secondarynamenode -format" command (this is similar to how the backupnode/checkpointnode works) > b) clear the current contents of the storage directory and save the new NN's VERSION info. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira