Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 10543 invoked from network); 29 Jan 2011 03:13:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Jan 2011 03:13:07 -0000 Received: (qmail 69888 invoked by uid 500); 29 Jan 2011 03:13:07 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 69777 invoked by uid 500); 29 Jan 2011 03:13:05 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 69769 invoked by uid 99); 29 Jan 2011 03:13:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jan 2011 03:13:04 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jan 2011 03:13:04 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p0T3Cht0002833 for ; Sat, 29 Jan 2011 03:12:44 GMT Message-ID: <11213943.282771296270763778.JavaMail.jira@thor> Date: Fri, 28 Jan 2011 22:12:43 -0500 (EST) From: "Konstantin Boudnik (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix In-Reply-To: <10786911.43321289547614702.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988372#action_12988372 ] Konstantin Boudnik commented on HDFS-1496: ------------------------------------------ Hairong, what I am seen on a real (0.20.2 based cluster) the NN storage volume which has been once removed (e.g. because of a faulty NFS mount or something) is emptied as soon SNN starts checkpoint process. This happens because {{FSEditLog.synchronized void rollEditLog}} calls {{FSImage.attemptRestoreRemovedStorage}} and effectively formats a faulty volume if it becomes available. I guess it is possible that a checkpoint can happen before rollEditLog was called and than the inconsistency you've mentioned might be introduced. I think it won't happen because {{SecondaryNameNode.doMerge}} iterates through Storage.storageDirs which won't contain failed volume unless it has been restored and formatted. If this all is true then we have a test which is failing not because the feature doesn't work but rather because the test needs to be changed in lights of HDFS-903. Please let me know if my analysis is incorrect. > TestStorageRestore is failing after HDFS-903 fix > ------------------------------------------------ > > Key: HDFS-1496 > URL: https://issues.apache.org/jira/browse/HDFS-1496 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Affects Versions: 0.22.0, 0.23.0 > Reporter: Konstantin Boudnik > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1496.sh, HDFS-1496.sh, HDFS-1496.sh > > > TestStorageRestore seems to be failing after HDFS-903 commit. Running git bisect confirms it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.