Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6BB527C70 for ; Tue, 27 Dec 2011 06:16:55 +0000 (UTC) Received: (qmail 31360 invoked by uid 500); 27 Dec 2011 06:16:54 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 31328 invoked by uid 500); 27 Dec 2011 06:16:53 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 31313 invoked by uid 99); 27 Dec 2011 06:16:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 06:16:52 +0000 X-ASF-Spam-Status: No, hits=-2001.3 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 06:16:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B45B112A80C for ; Tue, 27 Dec 2011 06:16:30 +0000 (UTC) Date: Tue, 27 Dec 2011 06:16:30 +0000 (UTC) From: "Aaron T. Myers (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1366129041.46021.1324966590740.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <286931372.32898.1324421490691.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176098#comment-13176098 ] Aaron T. Myers commented on HDFS-2709: -------------------------------------- Another option I'd like to put forth would be to separate the reading of edit log ops off disk from the actual application of those ops to the in-memory FS state. The first stage of the process would just read all edit log ops from disk and put them in a queue. The second stage would go through the queue and apply all the edits. If an error occurs during this stage, the standby NN would log a warning and continue on, since this is a potentially normal operating condition. The first stage is idempotent, and can safely be retried at a later time. This will allow us to queue up an entire file's worth of edits, guaranteed. If an error occurs in the second stage, we abort the standby NN, since this is indicative of a corrupt FS, and should not occur in practice. Thoughts? > HA: Appropriately handle error conditions in EditLogTailer > ---------------------------------------------------------- > > Key: HDFS-2709 > URL: https://issues.apache.org/jira/browse/HDFS-2709 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Todd Lipcon > Assignee: Aaron T. Myers > Priority: Critical > > Currently if the edit log tailer experiences an error replaying edits in the middle of a file, it will go back to retrying from the beginning of the file on the next tailing iteration. This is incorrect since many of the edits will have already been replayed, and not all edits are idempotent. > Instead, we either need to (a) support reading from the middle of a finalized file (ie skip those edits already applied), or (b) abort the standby if it hits an error while tailing. If "a" isn't simple, let's do "b" for now and come back to 'a' later since this is a rare circumstance and better to abort than be incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira