Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 861E2DE0B for ; Thu, 13 Dec 2012 23:46:14 +0000 (UTC) Received: (qmail 10125 invoked by uid 500); 13 Dec 2012 23:46:14 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 10067 invoked by uid 500); 13 Dec 2012 23:46:14 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 10058 invoked by uid 99); 13 Dec 2012 23:46:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2012 23:46:14 +0000 Date: Thu, 13 Dec 2012 23:46:14 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-4288) NN accepts incremental BR as IBR in safemode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-4288: ------------------------------ Attachment: HDFS-4288.branch-23.patch I think the second issue I mentioned regarding a bounced DN's BR not being processed can be solved have updateRegInfo reset the flag that short-circuits safemode BR processing. I originally tried something that tracked the timestamp of the registration but I think this is much simpler. It'll be trivial to tweak the patch for the other branches. Aaron, if this is a reasonable fix, would you please help write some unit tests? I'm having difficulty figuring out how to introduce a mock, or how to manipulate a mini-cluster to force the sequence of events to reproduce (ie. sync out a few blocks, stop NN, finalize last block, bring NN up in safemode and trick it into staying in safemode, ensure block update is received followed by block report, ensure block manager knows of all blocks; stop dn, remove blocks, re-reg in safemode and ensure NN forgets the removed blocks). Plus I'm at a conference and don't have many cycles. > NN accepts incremental BR as IBR in safemode > -------------------------------------------- > > Key: HDFS-4288 > URL: https://issues.apache.org/jira/browse/HDFS-4288 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-4288.branch-23.patch > > > If a DN is ready to send an incremental BR and the NN goes down, the DN will repeatedly try to reconnect. The NN will then process the DN's incremental BR as an initial BR. The NN now thinks the DN has only a few blocks, and will ignore all subsequent BRs from that DN until out of safemode -- which it may never do because of all the "missing" blocks on the affected DNs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira