hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
Date Wed, 08 Feb 2012 02:02:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203135#comment-13203135

Aaron T. Myers commented on HDFS-2914:

bq. The issue I see is that even if this standby is made active later on, it will not exit
out of the safemode unless user does the safemode leave. Do we want this behaviour?

I think we probably do. If the NFS mount is flaky, we've got bigger problems than just the
NN being moved into SM.

bq. The other problem with this approach is that if nfs dir bounces even once, standby will
go into safemode and this will happen silently without alerts.

I guess the admin should configure some alerts for the NN being in SM, then. :)

But regardless, I could probably be persuaded that the NN should leave SM automatically once
resources become available again, as long the implementation includes some measure(s) to prevent
the NN from flapping in/out of SM if the free space is hovering near the threshold. Something
like "leave SM automatically only if free space is now well above what is required, and only
if it's been like that for several minutes." Such a change would not be specific to the HA
branch, however, and should probably be done on trunk.
> HA: Standby stuck in safemode when shared edits directory is bounced
> --------------------------------------------------------------------
>                 Key: HDFS-2914
>                 URL: https://issues.apache.org/jira/browse/HDFS-2914
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Hari Mankude
> When shared edits dir is bounced, standby NN is put into safemode by the NameNodeResourceMonitor().
However, there is no path for it to exit out of safe mode when shared edits dir reappears.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message