[ https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913394#action_12913394
]
dhruba borthakur commented on HDFS-779:
---------------------------------------
> insulate the system from the catastrophe type you describe?
I think catastrohic events sometime occur when a part of the network fails. In this case,
no hdfs software enhancements (e.g. priority heartbeats) can solve the problem. But I still
think that priority heartbeats will address a large set of catastrophic events that are currently
not handled elegantly.
> Automatic move to safe-mode when cluster size drops
> ---------------------------------------------------
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: name-node
> Reporter: Owen O'Malley
> Assignee: dhruba borthakur
>
> As part of looking at using Kerberos, we want to avoid the case where both the primary
(and optional secondary) KDC go offline causing a replication storm as the DataNodes' service
tickets time out and they lose the ability to connect to the NameNode. However, this is a
specific case of a more general problem of loosing too many nodes too quickly. I think we
should have an option to go into safe mode if the cluster size goes down more than N% in terms
of DataNodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|