Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 13 Dec 2012 23:46:14 +0000 (UTC)
From: "Daryn Sharp (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12622964.1354911469994.3910.1355442374210@arcas>
In-Reply-To: <JIRA.12622964.1354911469994@arcas>
References: <JIRA.12622964.1354911469994@arcas>
Subject: [jira] [Updated] (HDFS-4288) NN accepts incremental BR as IBR in
 safemode
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daryn Sharp updated HDFS-4288:
------------------------------

    Attachment: HDFS-4288.branch-23.patch

I think the second issue I mentioned regarding a bounced DN's BR not being processed can be solved have updateRegInfo reset the flag that short-circuits safemode BR processing.  I originally tried something that tracked the timestamp of the registration but I think this is much simpler.  It'll be trivial to tweak the patch for the other branches.

Aaron, if this is a reasonable fix, would you please help write some unit tests?  I'm having difficulty figuring out how to introduce a mock, or how to manipulate a mini-cluster to force the sequence of events to reproduce (ie. sync out a few blocks, stop NN, finalize last block, bring NN up in safemode and trick it into staying in safemode, ensure block update is received followed by block report, ensure block manager knows of all blocks; stop dn, remove blocks, re-reg in safemode and ensure NN forgets the removed blocks).  Plus I'm at a conference and don't have many cycles.
                
> NN accepts incremental BR as IBR in safemode
> --------------------------------------------
>
>                 Key: HDFS-4288
>                 URL: https://issues.apache.org/jira/browse/HDFS-4288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-4288.branch-23.patch
>
>
> If a DN is ready to send an incremental BR and the NN goes down, the DN will repeatedly try to reconnect.  The NN will then process the DN's incremental BR as an initial BR.  The NN now thinks the DN has only a few blocks, and will ignore all subsequent BRs from that DN until out of safemode -- which it may never do because of all the "missing" blocks on the affected DNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira