hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops
Date Fri, 17 Sep 2010 00:10:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910403#action_12910403

Konstantin Shvachko commented on HDFS-779:

Some high level thoughts on the issue.

h3. Catastrophe! Is it real?
It looks like people are talking about different types of catastrophes.
# Rob and I were talking about "real" catastrophes, like a failure of a rack.
# Dhruba seems to be making a case for "imaginary" catastrophes, like loosing heartbeats due
to extremely high load on the name-node.
# Owen in the original description was talking about something in between - a "complex" catastrophe,
- which has both real and imaginary components. Like, loosing data-nodes because they cannot
reach to Kerberos and get authenticated.
The real component here is that DNs are lost both for the NN and for clients. 
The imaginary component is that DNs may eventually recover if Kerberos servers come back online.

The thing is that NN has no way to distinguish a real catastrophe from an imaginary one.
- Dhruba's proposal makes NN first assume the catastrophe is imaginary and delay the recovery
for some more time betting on that the system will get better by itself and optimizing for
client performance during the outage.
The concern here is that if the catastrophe turns out to be real we loose time and may make
things even worse.
- An alternative to this is treating each catastrophe seriously from the start and providing
the system some breathing space to let it recover faster. Whether the catastrophe is real
or imaginary the breathing space is vital.
In a real catastrophe blocks need to be replicated asap. 
In the imaginary one the load should be restricted to let NN get back on track with its housekeeping.

h3. Catastrophe policy.
We also have different approaches to defining a catastrophe. One is based on the amount of
failed data-nodes, another is base on the amount of under-replicated block.
# In the original proposal a catastrophic is declared when   num-failed-nodes / total-nodes
> x%
# Dhruba's catastrophe happens when num_failed_nodes(yesterday) / num_failed_nodes(today)
> m
# My catastrophe is defined as num-under-replicated-blocks / total-blocks > r

Blocks is a finer grained measure than data-nodes and is more precise: failure of a large
number but empty DNs will not cause a storm of replications in either (real or imaginary)

h3. Dealing with an imaginary Catastrophe.
In a discussion Rob noted that the introduction of a service port HDFS-599 for NN was intended
to separate the internal load from the external one. It is a way for NN to balance between
the two. 
It seems that in an imaginary catastrophe scenario we should measure the load on the system
and prevent the catastrophe in the first place by shifting priorities to processing the heartbeats,
rather than fighting the consequences of the catastrophe post factum.
The load measurements can use some "accrual" metrics Joydeep mentions, which could take into
account former and current performance or RPC failures.

> Automatic move to safe-mode when cluster size drops
> ---------------------------------------------------
>                 Key: HDFS-779
>                 URL: https://issues.apache.org/jira/browse/HDFS-779
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>            Reporter: Owen O'Malley
>            Assignee: dhruba borthakur
> As part of looking at using Kerberos, we want to avoid the case where both the primary
(and optional secondary) KDC go offline causing a replication storm as the DataNodes' service
tickets time out and they lose the ability to connect to the NameNode. However, this is a
specific case of a more general problem of loosing too many nodes too quickly. I think we
should have an option to go into safe mode if the cluster size goes down more than N% in terms
of DataNodes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message