hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4288) NN accepts incremental BR as IBR in safemode
Date Wed, 09 Jan 2013 19:32:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548872#comment-13548872

Daryn Sharp commented on HDFS-4288:

bq. This will also solve issues related to DN restart and NN may not process the block report.

True, but the boolean patch (simple incremental improvement on the existing trunk behavior)
fixes both DN restart and reregistration after a broken connection.  The NN cannot distinguish
the two.  So with a boolean, the NN (naively) processes the BR associated with every (re)registration.

A sequence number, that relies on a sentinel value, allows the DN to dictate the NN's behavior.
 This works well for restart since we know we are starting from 0.  For a rereg, block updates
may have been lost, so the sequence number must be guaranteed to always be reset to 0.  That's
naive like the boolean, and might be hard or fragile to ensure it's always reset - in which
case we might as well go with the boolean.

Better yet, the logic would be to {{(seqNum == 0 || seqNum != lastSeqNum)}}.  However this
requires writable/RPC changes on 23, and protobuf changes on trunk/2 and trying to ensure
backwards compatibility with an optional protobuf field, etc.  Would you be ok if I filed
another jira?

> NN accepts incremental BR as IBR in safemode
> --------------------------------------------
>                 Key: HDFS-4288
>                 URL: https://issues.apache.org/jira/browse/HDFS-4288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-4288.branch-23.patch
> If a DN is ready to send an incremental BR and the NN goes down, the DN will repeatedly
try to reconnect.  The NN will then process the DN's incremental BR as an initial BR.  The
NN now thinks the DN has only a few blocks, and will ignore all subsequent BRs from that DN
until out of safemode -- which it may never do because of all the "missing" blocks on the
affected DNs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message