Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 42DFC9231 for ; Sat, 4 Feb 2012 02:04:17 +0000 (UTC) Received: (qmail 49668 invoked by uid 500); 4 Feb 2012 02:04:16 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 49292 invoked by uid 500); 4 Feb 2012 02:04:15 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 49274 invoked by uid 99); 4 Feb 2012 02:04:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Feb 2012 02:04:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Feb 2012 02:04:14 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 1768C18C0BC for ; Sat, 4 Feb 2012 02:03:54 +0000 (UTC) Date: Sat, 4 Feb 2012 02:03:54 +0000 (UTC) From: "Todd Lipcon (Updated) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1121803544.10523.1328321034097.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1605401373.108.1328137433442.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-2874) HA: edit log should log to shared dirs before local dirs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2874: ------------------------------ Attachment: hdfs-2874.txt bq. Maybe add a comment mentioning that we use LinkedHashSet since it provides a predictable-order implementation of the Set interface? bq. "we need to make sure all edits are on place" - s/on/in/g Fixed bq. Why the call to Lists.newArrayList in getNamespaceEditsDirs? The function returns a List, so we have to copy out of the LinkedHashSet to a new List object there bq. Looks like you retained the functionality wherein required journals are operated on first, which should no longer be necessary, right? It should be OK as you have it, though, since the shared edits dir is automatically marked required, and therefore will necessarily be operated on before all others (required or non-required.) Fixed in the new revision. bq. I don't follow why the changes in GenericTestUtils were necessary. A bug fix that I noticed while working on the unit tests - but you're right, they're unrelated to this JIRA. I removed from this patch. > HA: edit log should log to shared dirs before local dirs > -------------------------------------------------------- > > Key: HDFS-2874 > URL: https://issues.apache.org/jira/browse/HDFS-2874 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Critical > Attachments: hdfs-2874.txt, hdfs-2874.txt, hdfs-2874.txt > > > Currently, the NN logs its edits to each of its edits directories in sequence. This can produce the following bad sequence: > - NN accumulates 100 edits (tx 1-100) in the buffer. Writes and syncs to local drive, then crashes > - Failover occurs. SBN takes over at txid=1, since txid 1 never got writen. > - First NN restarts. It reads up to txid 100 from its local directories. It is now "ahead" of the active NN with inconsistent state. > The solution is to write to the shared edits dir, and sync that, before writing to any local drives. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira