Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7229D49C for ; Wed, 29 Aug 2012 11:25:08 +0000 (UTC) Received: (qmail 95939 invoked by uid 500); 29 Aug 2012 11:25:08 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 95725 invoked by uid 500); 29 Aug 2012 11:25:08 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 95706 invoked by uid 99); 29 Aug 2012 11:25:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 11:25:07 +0000 Date: Wed, 29 Aug 2012 22:25:07 +1100 (NCT) From: "Chao Shi (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <909228880.11238.1346239507853.JavaMail.jiratomcat@arcas> In-Reply-To: <2102433793.8672.1346188388294.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3863) QJM: track last "committed" txid MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443969#comment-13443969 ] Chao Shi commented on HDFS-3863: -------------------------------- Todd, assume JN1/2/3 make up a quorum and JN1 is far behind. JN1 is selected to be the lastest one by some buggy algorithm and NN is going to log after JN1. JN2 and JN3 will reject, since they know their log number is greater than JN1's. Everything works fine so far. However, imagine a stupid administrator replaces JN2 and JN3 with some new machines. Since JN1 is far behind, it doesn't know about the journal number committed by JN2 and JN3. It passes the check. I'm thinking of the similarity between committed-txid and epoch number. They both never decrease. I think we can do the following: - NN maintain highest committed-txid in its memory (or more particularly a member of AsyncLoggerSet) - NN sends it to JN in request header of every packet - JN saves committed-txid - NN updates its committed-txid once a write is acked by a quorum of JNs Note that a JN falls behind may still learn the highest committed-txid, as long as the connection between that JN and NN works. The invariant there is NN's committed-txid >= JN's committed-txid. We can also add an extra check when NN decide the txid to finalize: it should no less than any of JN's commited-txid. > QJM: track last "committed" txid > -------------------------------- > > Key: HDFS-3863 > URL: https://issues.apache.org/jira/browse/HDFS-3863 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha > Affects Versions: QuorumJournalManager (HDFS-3077) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > > Per some discussion with [~stepinto] [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579], we should keep track of the "last committed txid" on each JournalNode. Then during any recovery operation, we can sanity-check that we aren't asked to truncate a log to an earlier transaction. > This is also a necessary step if we want to support reading from in-progress segments in the future (since we should only allow reads up to the commit point) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira