hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4288) NN accepts incremental BR as IBR in safemode
Date Thu, 13 Dec 2012 23:46:14 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated HDFS-4288:
------------------------------

    Attachment: HDFS-4288.branch-23.patch

I think the second issue I mentioned regarding a bounced DN's BR not being processed can be
solved have updateRegInfo reset the flag that short-circuits safemode BR processing.  I originally
tried something that tracked the timestamp of the registration but I think this is much simpler.
 It'll be trivial to tweak the patch for the other branches.

Aaron, if this is a reasonable fix, would you please help write some unit tests?  I'm having
difficulty figuring out how to introduce a mock, or how to manipulate a mini-cluster to force
the sequence of events to reproduce (ie. sync out a few blocks, stop NN, finalize last block,
bring NN up in safemode and trick it into staying in safemode, ensure block update is received
followed by block report, ensure block manager knows of all blocks; stop dn, remove blocks,
re-reg in safemode and ensure NN forgets the removed blocks).  Plus I'm at a conference and
don't have many cycles.
                
> NN accepts incremental BR as IBR in safemode
> --------------------------------------------
>
>                 Key: HDFS-4288
>                 URL: https://issues.apache.org/jira/browse/HDFS-4288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-4288.branch-23.patch
>
>
> If a DN is ready to send an incremental BR and the NN goes down, the DN will repeatedly
try to reconnect.  The NN will then process the DN's incremental BR as an initial BR.  The
NN now thinks the DN has only a few blocks, and will ignore all subsequent BRs from that DN
until out of safemode -- which it may never do because of all the "missing" blocks on the
affected DNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message