hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HDDS-1881) Design doc: decommissioning in Ozone
Date Wed, 31 Jul 2019 16:54:00 GMT

     [ https://issues.apache.org/jira/browse/HDDS-1881?focusedWorklogId=286054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286054
]

ASF GitHub Bot logged work on HDDS-1881:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Jul/19 16:53
            Start Date: 31/Jul/19 16:53
    Worklog Time Spent: 10m 
      Work Description: sodonnel commented on pull request #1196: HDDS-1881. Design doc: decommissioning
in Ozone
URL: https://github.com/apache/hadoop/pull/1196#discussion_r309327167
 
 

 ##########
 File path: hadoop-hdds/docs/content/design/decommissioning.md
 ##########
 @@ -0,0 +1,720 @@
+---
+title: Decommissioning in Ozone
+summary: Formal process to shut down machines in a safe way after the required replications.
+date: 2019-07-31
+jira: HDDS-1881
+status: current
+author: Anu Engineer, Marton Elek, Stephen O'Donnell 
+---
+
+
+# Abstract 
+
+The goal of decommissioning is to turn off a selected set of machines without data loss.
It may or may not require to move the existing replicas of the containers to other nodes.
+
+There are two main classes of the decommissioning:
+
+ * __Maintenance mode__: where the node is expected to be back after a while. It may not
require replication of containers if enough replicas are available from other nodes (as we
expect to have the current replicas after the restart.)
+
+ * __Decommissioning__: where the node won't be started again. All the data should be replicated
according to the current replication rules.
+
+Goals:
+
+ * Decommissioning can be canceled any time
+ * The progress of the decommissioning should be trackable
+ * The nodes under decommissioning / maintenance mode should not been used for new pipelines
/ containers
+ * The state of the datanodes should be persisted / replicated by the SCM (in HDFS the decommissioning
info exclude/include lists are replicated manually by the admin). If datanode is marked for
decommissioning this state be available after SCM and/or Datanode restarts.  
+ * We need to support validations before decommissioing (but the violations can be ignored
by the admin).
+ * The administrator should be notified when a node can be turned off.
+ * The maintenance mode can be time constrained: if the node marked for maintenance for 1
week and the node is not up after one week, the containers should be considered as lost (DEAD
node) and should be replicated.
+
+# Introduction
+
+Ozone is a highly available file system that relies on commodity hardware. In other words,
Ozone is designed to handle failures of these nodes all the time.
+
+The Storage Container Manager(SCM) is designed to monitor the node health and replicate blocks
and containers as needed.
+
+At times, Operators of the cluster can help the SCM by giving it hints. When removing a datanode,
the operator can provide a hint. That is, a planned failure of the node is coming up, and
SCM can make sure it reaches a safe state to handle this planned failure.
+
+Some times, this failure is transient; that is, the operator is taking down this node temporarily.
In that case, we can live with lower replica counts by being optimistic.
+
+Both of these operations, __Maintenance__, and __Decommissioning__ are similar from the Replication
point of view. In both cases, and the user instructs us on how to handle an upcoming failure.
+
+Today, SCM (*Replication Manager* component inside SCM) understands only one form of failure
handling. This paper extends Replica Manager failure modes to allow users to request which
failure handling model to be adopted(Optimistic or Pessimistic).
+
+Based on physical realities, there are two responses to any perceived failure, to heal the
system by taking corrective actions or ignore the failure since the actions in the future
will heal the system automatically.
+
+## User Experiences (Decommissioning vs Maintenance mode)
+
+From the user's point of view, there are two kinds of planned failures that the user would
like to communicate to Ozone.
+
+The first kind is when a 'real' failure is going to happen in the future. This 'real' failure
is the act of decommissioning. We denote this as "decommission" throughout this paper. The
response that the user wants is SCM/Ozone to make replicas to deal with the planned failure.
+
+The second kind is when the failure is 'transient.' The user knows that this failure is temporary
and cluster in most cases can safely ignore this issue. However, if the transient failures
are going to cause a failure of availability; then the user would like the Ozone to take appropriate
actions to address it.  An example of this case, is if the user put 3 data nodes into maintenance
mode and switched them off.
+
+The transient failure can violate the availability guarantees of Ozone; Since the user is
telling us not to take corrective actions. Many times, the user does not understand the impact
on availability while asking Ozone to ignore the failure.
+
+So this paper proposes the following definitions for Decommission and Maintenance of data
nodes.
+
+__Decommission__ of a data node is deemed to be complete when SCM/Ozone completes the replica
of all containers on decommissioned data node to other data nodes.That is, the expected count
matches the healthy count of containers in the cluster.
+
+__Maintenance mode__ of a data node is complete if Ozone can guarantee at least one copy
of every container is available in other healthy data nodes.
+
+## Examples 
+
+Here are some illustrative examples:
+
+1.  Let us say we have a container, which has only one copy and resides on Machine A. If
the user wants to put machine A into maintenance mode; Ozone will make a replica before entering
the maintenance mode.
+
+2. Suppose a container has two copies, and the user wants to put Machine A to maintenance
mode. In this case; the Ozone understands that availability of the container is not affected
and hence can decide to forgo replication.
+
+3. Suppose a container has two copies, and the user wants to put Machine A into maintenance
mode. However, the user wants to put the machine into maintenance mode for one month. As the
period of maintenance mode increases, the probability of data loss increases; hence, Ozone
might choose to make a replica of the container even if we are entering maintenance mode.
+
+4. The semantics of decommissioning means that as long as we can find copies of containers
in other machines, we can technically get away with calling decommission complete. Hence this
clarification node; in the ordinary course of action; each decommission will create a replication
flow for each container we have; however, it is possible to complete a decommission of a data
node, even if we get a failure of the  data node being decommissioned. As long as we can find
the other datanodes to replicate from and get the number of replicas needed backup to expected
count we are good.
+
 
 Review comment:
   > it is possible to complete a decommission of a data node, even if we get a failure
of the  data node being decommissioned
   This sounds like a state transition from DECOMMISSIONING -> DEAD, similar to IN_SERVICE
-> DEAD. Its not really true that decommissioning completes if the node fails. For example
if the node came back online after the failure, and replication of the under-replicated container
had not yet completed, then the node would go back to DECOMMISSIONING until all the containers
it hosts have reach 3 live replicas.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 286054)
    Time Spent: 33h 20m  (was: 33h 10m)

> Design doc: decommissioning in Ozone
> ------------------------------------
>
>                 Key: HDDS-1881
>                 URL: https://issues.apache.org/jira/browse/HDDS-1881
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>              Labels: design, pull-request-available
>          Time Spent: 33h 20m
>  Remaining Estimate: 0h
>
> Design doc can be attached to the documentation. In this jira the design doc will be
attached and merged to the documentation page.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message