Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DD73D466 for ; Thu, 23 Aug 2012 17:49:43 +0000 (UTC) Received: (qmail 50672 invoked by uid 500); 23 Aug 2012 17:49:43 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 50633 invoked by uid 500); 23 Aug 2012 17:49:43 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 50537 invoked by uid 99); 23 Aug 2012 17:49:43 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 17:49:43 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9893C2C0A56 for ; Thu, 23 Aug 2012 17:49:42 +0000 (UTC) Date: Fri, 24 Aug 2012 04:49:42 +1100 (NCT) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1027654193.6471.1345744182625.JavaMail.jiratomcat@arcas> In-Reply-To: <1243872222.38125.1345598318160.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3835) Long-lived 2NN cannot perform a checkpoint if security is enabled and the NN restarts with outstanding delegation tokens MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440483#comment-13440483 ] Colin Patrick McCabe commented on HDFS-3835: -------------------------------------------- When I was doing the "fail over to loading a different FSImage if the first one we try to load is corrupt" patch, I also had to deal with the issue of clearing data that had been loaded. See https://issues.apache.org/jira/secure/attachment/12526022/HDFS-3277.003.patch In particular, I created this method in FSNamesystem: {code} /** + * Clear any data that was loaded by FSImageFormat.Loader + */ + public void clearLoadedImage() { + generationStamp.setStamp(GenerationStamp.FIRST_VALID_STAMP); + dir.reset(); + leaseManager.removeAllLeases(); + } {code} and this method in LeaseManager.java: {code} + synchronized void removeAllLeases() { + sortedLeases.clear(); + sortedLeasesByPath.clear(); + leases.clear(); + } {code} I wonder if we need any of this for the 2NN case? > Long-lived 2NN cannot perform a checkpoint if security is enabled and the NN restarts with outstanding delegation tokens > ------------------------------------------------------------------------------------------------------------------------ > > Key: HDFS-3835 > URL: https://issues.apache.org/jira/browse/HDFS-3835 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security > Affects Versions: 2.0.0-alpha > Reporter: Aaron T. Myers > Assignee: Aaron T. Myers > Fix For: 2.2.0-alpha > > Attachments: HDFS-3835.patch > > > When the 2NN wants to perform a checkpoint, it figures out the highest transaction ID of the fsimage files on the NN, and if the 2NN has a copy of that fsimage file (because it created that merged fsimage file the last time it did a checkpoint) then the 2NN won't download the fsimage file from the NN, and instead only gets the new edits files from the NN. In this case, the 2NN also doesn't even bother reloading the fsimage file it has from disk, since it has all of the namespace state in-memory. This all works just fine. > When the 2NN _doesn't_ have a copy of the relevant fsimage file (for example, if the NN had restarted since the last checkpoint) then the 2NN blows away its in-memory namespace state, downloads the fsimage file from the NN, and loads the newly-downloaded fsimage file from disk. The bug is that when the 2NN clears its in-memory state, it only resets the namespace, but not the delegation token map. > The fix is pretty simple - just make the delegation token map get cleared as well as the namespace state when a running 2NN needs to load a new fsimage from disk. > Credit to Stephen Chu for identifying this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira