hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jia...@apache.org
Subject [05/44] hadoop git commit: YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation into one. Contributed by Masatake Iwasaki.
Date Tue, 21 Jul 2015 23:13:49 GMT
YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation into one. Contributed
by Masatake Iwasaki.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/f02dd146
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/f02dd146
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/f02dd146

Branch: refs/heads/YARN-1197
Commit: f02dd146f58bcfa0595eec7f2433bafdd857630f
Parents: 111e6a3
Author: Tsuyoshi Ozawa <ozawa@apache.org>
Authored: Thu Jul 16 15:22:30 2015 +0900
Committer: Tsuyoshi Ozawa <ozawa@apache.org>
Committed: Thu Jul 16 15:22:30 2015 +0900

----------------------------------------------------------------------
 hadoop-project/src/site/site.xml                |  2 +-
 hadoop-yarn-project/CHANGES.txt                 |  3 ++
 .../src/site/markdown/NodeManager.md            | 41 +++++++++++++--
 .../src/site/markdown/NodeManagerRestart.md     | 53 --------------------
 4 files changed, 42 insertions(+), 57 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/f02dd146/hadoop-project/src/site/site.xml
----------------------------------------------------------------------
diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml
index 55be0d9..ee0dfcd 100644
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@@ -124,7 +124,7 @@
       <item name="Web Application Proxy" href="hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html"/>
       <item name="Timeline Server" href="hadoop-yarn/hadoop-yarn-site/TimelineServer.html"/>
       <item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/>
-      <item name="NodeManager Restart" href="hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html"/>
+      <item name="NodeManager" href="hadoop-yarn/hadoop-yarn-site/NodeManager.html"/>
       <item name="DockerContainerExecutor" href="hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html"/>
       <item name="Using CGroups" href="hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html"/>
       <item name="Secure Containers" href="hadoop-yarn/hadoop-yarn-site/SecureContainer.html"/>

http://git-wip-us.apache.org/repos/asf/hadoop/blob/f02dd146/hadoop-yarn-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt
index 0a6f871..1e6c7d5 100644
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -631,6 +631,9 @@ Release 2.8.0 - UNRELEASED
     YARN-3453. Ensure preemption logic in FairScheduler uses DominantResourceCalculator
     in DRF queues to prevent unnecessary thrashing. (asuresh)
 
+    YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation 
+    into one. (Masatake Iwasaki via ozawa)
+
 Release 2.7.2 - UNRELEASED
 
   INCOMPATIBLE CHANGES

http://git-wip-us.apache.org/repos/asf/hadoop/blob/f02dd146/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
index 6341c60..69e99a7 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
@@ -12,19 +12,23 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-NodeManager Overview
-=====================
+NodeManager
+===========
 
 * [Overview](#Overview)
 * [Health Checker Service](#Health_checker_service)
     * [Disk Checker](#Disk_Checker)
     * [External Health Script](#External_Health_Script)
+* [NodeManager Restart](#NodeManager_Restart)
+    * [Introduction](#Introduction)
+    * [Enabling NM Restart](#Enabling_NM_Restart)
 
 Overview
 --------
 
 The NodeManager is responsible for launching and managing containers on a node. Containers
execute tasks as specified by the AppMaster.
 
+
 Health Checker Service
 ----------------------
 
@@ -42,7 +46,6 @@ The NodeManager runs services to determine the health of the node it is
executin
 | `yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage` | Float
between 0-100 | The maximum percentage of disk space that may be utilized before a disk is
marked as unhealthy by the disk checker service. This check is run for every disk used by
the NodeManager. The default value is 100 i.e. the entire disk can be used. |
 | `yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb` | Integer | The minimum
amount of free space that must be available on the disk for the disk checker service to mark
the disk as healthy. This check is run for every disk used by the NodeManager. The default
value is 0 i.e. the entire disk can be used. |
 
-
 ###External Health Script
 
   Users may specify their own health checker script that will be invoked by the health checker
service. Users may specify a timeout as well as options to be passed to the script. If the
script exits with a non-zero exit code, times out or results in an exception being thrown,
the node is marked as unhealthy. Please note that if the script cannot be executed due to
permissions or an incorrect path, etc, then it counts as a failure and the node will be reported
as unhealthy. Please note that speifying a health check script is not mandatory. If no script
is specified, only the disk checker status will be used to determine the health of the node.
The following configuration parameters can be used to set the health script:
@@ -55,3 +58,35 @@ The NodeManager runs services to determine the health of the node it is
executin
 | `yarn.nodemanager.health-checker.script.opts` | String | Arguments to be passed to the
script when the script is executed. |
 
 
+NodeManager Restart
+-------------------
+
+### Introduction
+
+This document gives an overview of NodeManager (NM) restart, a feature that enables the NodeManager
to be restarted without losing the active containers running on the node. At a high level,
the NM stores any necessary state to a local state-store as it processes container-management
requests. When the NM restarts, it recovers by first loading state for various subsystems
and then letting those subsystems perform recovery using the loaded state.
+
+### Enabling NM Restart
+
+Step 1. To enable NM Restart functionality, set the following property in **conf/yarn-site.xml**
to *true*.
+
+| Property | Value |
+|:---- |:---- |
+| `yarn.nodemanager.recovery.enabled` | `true`, (default value is set to false) |
+
+Step 2.  Configure a path to the local file-system directory where the NodeManager can save
its run state.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.nodemanager.recovery.dir` | The local filesystem directory in which the node manager
will store state when recovery is enabled. The default value is set to `$hadoop.tmp.dir/yarn-nm-recovery`.
|
+
+Step 3.  Configure a valid RPC address for the NodeManager.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.nodemanager.address` | Ephemeral ports (port 0, which is default) cannot be used
for the NodeManager's RPC server specified via yarn.nodemanager.address as it can make NM
use different ports before and after a restart. This will break any previously running clients
that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address
to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling
NM restart. |
+
+Step 4.  Auxiliary services.
+
+  * NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely
functional NM restart, YARN relies on any auxiliary service configured to also support recovery.
This usually includes (1) avoiding usage of ephemeral ports so that previously running clients
(in this case, usually containers) are not disrupted after restart and (2) having the auxiliary
service itself support recoverability by reloading any previous state when NodeManager restarts
and reinitializes the auxiliary service.
+
+  * A simple example for the above is the auxiliary service 'ShuffleHandler' for MapReduce
(MR). ShuffleHandler respects the above two requirements already, so users/admins don't have
do anything for it to support NM restart: (1) The configuration property **mapreduce.shuffle.port**
controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to
a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous
state after NM restarts.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/f02dd146/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
deleted file mode 100644
index be7d75b..0000000
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
+++ /dev/null
@@ -1,53 +0,0 @@
-<!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
--->
-
-NodeManager Restart
-===================
-
-* [Introduction](#Introduction)
-* [Enabling NM Restart](#Enabling_NM_Restart)
-
-Introduction
-------------
-
-This document gives an overview of NodeManager (NM) restart, a feature that enables the NodeManager
to be restarted without losing the active containers running on the node. At a high level,
the NM stores any necessary state to a local state-store as it processes container-management
requests. When the NM restarts, it recovers by first loading state for various subsystems
and then letting those subsystems perform recovery using the loaded state.
-
-Enabling NM Restart
--------------------
-
-Step 1. To enable NM Restart functionality, set the following property in **conf/yarn-site.xml**
to *true*.
-
-| Property | Value |
-|:---- |:---- |
-| `yarn.nodemanager.recovery.enabled` | `true`, (default value is set to false) |
-
-Step 2.  Configure a path to the local file-system directory where the NodeManager can save
its run state.
-
-| Property | Description |
-|:---- |:---- |
-| `yarn.nodemanager.recovery.dir` | The local filesystem directory in which the node manager
will store state when recovery is enabled. The default value is set to `$hadoop.tmp.dir/yarn-nm-recovery`.
|
-
-Step 3.  Configure a valid RPC address for the NodeManager.
-
-| Property | Description |
-|:---- |:---- |
-| `yarn.nodemanager.address` | Ephemeral ports (port 0, which is default) cannot be used
for the NodeManager's RPC server specified via yarn.nodemanager.address as it can make NM
use different ports before and after a restart. This will break any previously running clients
that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address
to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling
NM restart. |
-
-Step 4.  Auxiliary services.
-
-  * NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely
functional NM restart, YARN relies on any auxiliary service configured to also support recovery.
This usually includes (1) avoiding usage of ephemeral ports so that previously running clients
(in this case, usually containers) are not disrupted after restart and (2) having the auxiliary
service itself support recoverability by reloading any previous state when NodeManager restarts
and reinitializes the auxiliary service.
-
-  * A simple example for the above is the auxiliary service 'ShuffleHandler' for MapReduce
(MR). ShuffleHandler respects the above two requirements already, so users/admins don't have
do anything for it to support NM restart: (1) The configuration property **mapreduce.shuffle.port**
controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to
a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous
state after NM restarts.
-
-


Mime
View raw message