hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7046) HA NN can NPE upon transition to active
Date Fri, 12 Sep 2014 18:41:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131902#comment-14131902
] 

Jing Zhao commented on HDFS-7046:
---------------------------------

Thanks for working on this, [~kihwal] and [~daryn].

One minor concern of the current solution is that, if we move checkSafeMode from the middle
of transition to the end of transition, the actual service down time may increase (if SBN
is in safemode during the failover), especially considering we still have the 30s safemode
extension time. So instead, can we add an {{!inTransitionToActive()}} check for the {{startSecretManagerIfNecessary}}
call inside of {{SafeModeInfo#leave}}?

> HA NN can NPE upon transition to active
> ---------------------------------------
>
>                 Key: HDFS-7046
>                 URL: https://issues.apache.org/jira/browse/HDFS-7046
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0, 2.5.0
>            Reporter: Daryn Sharp
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7046.patch, HDFS-7046_test_reproduce.patch
>
>
> While processing edits, the NN may decide after adjusting block totals to leave safe
mode - in the middle of the edit.  Going active starts the secret manager which generates
a new secret key, which in turn generates an edit, which NPEs because the edit log is not
open.
> # Transitions should _not_ occur in the middle of an edit.
> # The edit log appears to claim it's open for write when the stream isn't even open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message