Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 30022 invoked from network); 16 Feb 2011 18:23:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2011 18:23:47 -0000 Received: (qmail 35720 invoked by uid 500); 16 Feb 2011 18:23:47 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 35623 invoked by uid 500); 16 Feb 2011 18:23:45 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 35615 invoked by uid 99); 16 Feb 2011 18:23:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 18:23:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 18:23:44 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5FAEE19677C for ; Wed, 16 Feb 2011 18:23:24 +0000 (UTC) Date: Wed, 16 Feb 2011 18:23:24 +0000 (UTC) From: "Ivan Kelly (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <930994165.647.1297880604388.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <16723130.15011291079951700.JavaMail.jira@thor> Subject: [jira] Updated: (HDFS-1521) Persist transaction ID on disk between NN restarts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1521: ----------------------------- Attachment: HDFS-1521.diff This patch addresses all Konstantin's comments except 11. There is something strange going on with lastAppliedTxid, that TestBackupNode isn't currently picking up. At line 177 change it to the following {code} // // Take a checkpoint // backup = startBackupNode(conf, op, 1); waitCheckpointDone(backup); for (int i = 0; i < 10; i++) { writeFile(fileSys, new Path("file_" + i), replication); } backup.doCheckpoint(); waitCheckpointDone(backup); {code} This will trigger the test to fail. The normal run of the test doesn't exercise convergeJournalSpool, so usually you don't see this. So, now you'll see that if BackupNode loads a checkpoint, and then tries to journal something, the lastAppliedTxid + 1 will be 1 even though we've loaded in an image and editlog. The simple fix is to put {code} lastAppliedTxId = getEditLog().getLastWrittenTxId(); {code} in loadCheckpoint(). This should be the end of the story. However, with this change, you get the error {quote} java.io.IOException: Expected transaction ID 10 but got 11 {quote} A transaction is going missing. Whats happening is, when doCheckpoint get kicked off, the log is rolled, and logJSpoolStart is called which creates an edit with opcode OP_JSPOOL_START. This opcode, is caught by EditLogBackupOutputStream and never transmitted to the backup node, so the transaction ids on the Primary and the Backup get out of sync. So, the question here is, is there any harm is actually transferring these OP_JSPOOL_START transactions, or are they just excluded as a precaution? > Persist transaction ID on disk between NN restarts > -------------------------------------------------- > > Key: HDFS-1521 > URL: https://issues.apache.org/jira/browse/HDFS-1521 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: HDFS-1521.diff, HDFS-1521.diff, hdfs-1521.3.txt, hdfs-1521.4.txt, hdfs-1521.5.txt, hdfs-1521.txt, hdfs-1521.txt > > > For HDFS-1073 and other future work, we'd like to have the concept of a transaction ID that is persisted on disk with the image/edits. We already have this concept in the NameNode but it resets to 0 on restart. We can also use this txid to replace the _checkpointTime_ field, I believe. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira