hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deva...@apache.org
Subject hadoop git commit: YARN-4100. Add Documentation for Distributed and Delegated-Centralized Node Labels feature. Contributed by Naganarasimha G R.
Date Tue, 02 Feb 2016 06:41:44 GMT
Repository: hadoop
Updated Branches:
  refs/heads/trunk 1cd55e0c1 -> db144eb1c


YARN-4100. Add Documentation for Distributed and Delegated-Centralized
Node Labels feature. Contributed by Naganarasimha G R.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/db144eb1
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/db144eb1
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/db144eb1

Branch: refs/heads/trunk
Commit: db144eb1c51c1f37bdd1e0c18e9a5b0969c82e33
Parents: 1cd55e0
Author: Devaraj K <devaraj@apache.org>
Authored: Tue Feb 2 12:06:51 2016 +0530
Committer: Devaraj K <devaraj@apache.org>
Committed: Tue Feb 2 12:06:51 2016 +0530

----------------------------------------------------------------------
 hadoop-yarn-project/CHANGES.txt                 |  3 +
 .../src/main/resources/yarn-default.xml         | 50 ++++++------
 .../src/site/markdown/NodeLabel.md              | 86 ++++++++++++++++----
 3 files changed, 99 insertions(+), 40 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/db144eb1/hadoop-yarn-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt
index bf46864..345c64b 100644
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -778,6 +778,9 @@ Release 2.8.0 - UNRELEASED
 
     YARN-4340. Add "list" API to reservation system. (Sean Po via wangda)
 
+    YARN-4100. Add Documentation for Distributed and Delegated-Centralized
+    Node Labels feature. (Naganarasimha G R via devaraj)
+
   OPTIMIZATIONS
 
     YARN-3339. TestDockerContainerExecutor should pull a single image and not

http://git-wip-us.apache.org/repos/asf/hadoop/blob/db144eb1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
index e33d23e..d8ea3ad 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
@@ -2281,26 +2281,26 @@
   <!-- Distributed Node Labels Configuration -->
   <property>
     <description>
-    When "yarn.node-labels.configuration-type" parameter in RM is configured as
-    "distributed", Administrators can configure in NM, the provider for	the
+    When "yarn.node-labels.configuration-type" is configured with "distributed"
+    in RM, Administrators can configure in NM the provider for the
     node labels by configuring this parameter. Administrators can
-    specify "config", "script" or the class name of the provider. Configured
+    configure "config", "script" or the class name of the provider. Configured
     class needs to extend
     org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider.
-    If "config" is specified then "ConfigurationNodeLabelsProvider" and
-    "script" then "ScriptNodeLabelsProvider" will be used.
+    If "config" is configured, then "ConfigurationNodeLabelsProvider" and if
+    "script" is configured, then "ScriptNodeLabelsProvider" will be used.
     </description>
     <name>yarn.nodemanager.node-labels.provider</name>
   </property>
 
   <property>
     <description>
-    When node labels "yarn.nodemanager.node-labels.provider" is of type
-    "config" or the configured class extends AbstractNodeLabelsProvider then
-    periodically node labels are retrieved from the node labels provider.
-    This configuration is to define the interval. If -1 is configured then
-    node labels are retrieved from. provider only during initialization.
-    Defaults to 10 mins.
+    When "yarn.nodemanager.node-labels.provider" is configured with "config",
+    "Script" or the configured class extends AbstractNodeLabelsProvider, then
+    periodically node labels are retrieved from the node labels provider. This
+    configuration is to define the interval period.
+    If -1 is configured then node labels are retrieved from provider only
+    during initialization. Defaults to 10 mins.
     </description>
     <name>yarn.nodemanager.node-labels.provider.fetch-interval-ms</name>
     <value>600000</value>
@@ -2308,8 +2308,8 @@
 
   <property>
     <description>
-   Interval at which node labels syncs with RM from NM.Will send loaded labels
-   every x intervals configured along with heartbeat from NM to RM.
+   Interval at which NM syncs its node labels with RM. NM will send its loaded
+   labels every x intervals configured, along with heartbeat to RM.
     </description>
     <name>yarn.nodemanager.node-labels.resync-interval-ms</name>
     <value>120000</value>
@@ -2317,19 +2317,18 @@
 
   <property>
     <description>
-    When node labels "yarn.nodemanager.node-labels.provider"
-    is of type "config" then ConfigurationNodeLabelsProvider fetches the
-    partition from this parameter.
+    When "yarn.nodemanager.node-labels.provider" is configured with "config"
+    then ConfigurationNodeLabelsProvider fetches the partition label from this
+    parameter.
     </description>
     <name>yarn.nodemanager.node-labels.provider.configured-node-partition</name>
   </property>
 
   <property>
     <description>
-    When node labels "yarn.nodemanager.node-labels.provider" is a class
-    which extends AbstractNodeLabelsProvider then this configuration provides
-    the timeout period after which it will stop querying the Node labels
-    provider. Defaults to 20 mins.
+    When "yarn.nodemanager.node-labels.provider" is configured with "Script"
+    then this configuration provides the timeout period after which it will
+    interrupt the script which queries the Node labels. Defaults to 20 mins.
     </description>
     <name>yarn.nodemanager.node-labels.provider.fetch-timeout-ms</name>
     <value>1200000</value>
@@ -2351,8 +2350,8 @@
 
   <property>
     <description>
-    When node labels "yarn.node-labels.configuration-type" is of type
-    "delegated-centralized" then periodically node labels are retrieved
+    When "yarn.node-labels.configuration-type" is configured with
+    "delegated-centralized", then periodically node labels are retrieved
     from the node labels provider. This configuration is to define the
     interval. If -1 is configured then node labels are retrieved from
     provider only once for each node after it registers. Defaults to 30 mins.
@@ -2362,9 +2361,10 @@
   </property>
 
   <property>
-    <description>The Node Label script to run. Script output Lines starting with
-     "NODE_PARTITION:" will be considered for Node Labels. In case of multiple
-     lines having the pattern, last one will be considered</description>
+    <description>The Node Label script to run. Script output Line starting with
+     "NODE_PARTITION:" will be considered as Node Label Partition. In case of
+     multiple lines have this pattern, then last one will be considered
+    </description>
     <name>yarn.nodemanager.node-labels.provider.script.path</name>
   </property>
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/db144eb1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
index 87019cd..1fecf07 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
@@ -15,7 +15,22 @@
 YARN Node Labels
 ===============
 
-# Overview
+* [Overview](#Overview)
+* [Features](#Features)
+* [Configuration](#Configuration)
+    * [Setting up ResourceManager to enable Node Labels](#Setting_up_ResourceManager_to_enable_Node_Labels)
+    * [Add/modify node labels list to YARN](#Add/modify_node_labels_list_to_YARN)
+    * [Add/modify node-to-labels mapping to YARN](#Add/modify_node-to-labels_mapping_to_YARN)
+    * [Configuration of Schedulers for node labels](#Configuration_of_Schedulers_for_node_labels)
+* [Specifying node label for application](#Specifying_node_label_for_application)
+* [Monitoring](#Monitoring)
+    * [Monitoring through web UI](#Monitoring_through_web_UI)
+    * [Monitoring through commandline](#Monitoring_through_commandline)
+* [Useful links](#Useful_links)
+
+Overview
+--------
+
 Node label is a way to group nodes with similar characteristics and applications can specify
where to run.
 
 Now we only support node partition, which is:
@@ -28,20 +43,28 @@ Now we only support node partition, which is:
 
 User can specify set of node labels which can be accessed by each queue, one application
can only use subset of node labels that can be accessed by the queue which contains the application.
 
-# Features
+Features
+--------
+
 The ```Node Labels``` supports the following features for now:
 
 * Partition cluster - each node can be assigned one label, so the cluster will be divided
to several smaller disjoint partitions.
 * ACL of node-labels on queues - user can set accessible node labels on each queue so only
some nodes can only be accessed by specific queues.
 * Specify percentage of resource of a partition which can be accessed by a queue - user can
set percentage like: queue A can access 30% of resources on nodes with label=hbase. Such percentage
setting will be consistent with existing resource manager
-* Specify required Node Label in resource request, it will only be allocated when node has
the same label. If no node label requirement specified, such Resource Request will only be
allocated on nodes belong to DEFAULT partition.
+* Specify required node label in resource request, it will only be allocated when node has
the same label. If no node label requirement specified, such Resource Request will only be
allocated on nodes belong to DEFAULT partition.
 * Operability
     * Node labels and node labels mapping can be recovered across RM restart
     * Update node labels - admin can update labels on nodes and labels on queues
       when RM is running
+* Mapping of NM to node labels can be done in three ways, but in all of the approaches Partition
Label should be one among the valid node labels list configured in the RM.
+    * **Centralized :** Node to labels mapping can be done through RM exposed CLI, REST or
RPC.
+    * **Distributed :** Node to labels mapping will be set by a configured Node Labels Provider
in NM. We have two different providers in YARN: *Script* based provider and *Configuration*
based provider. In case of script, NM can be configured with a script path and the script
can emit the labels of the node. In case of config, node Labels can be directly configured
in the NM's yarn-site.xml. In both of these options dynamic refresh of the label mapping is
supported.
+    * **Delegated-Centralized :** Node to labels mapping will be set by a configured Node
Labels Provider in RM. This would be helpful when label mapping cannot be provided by each
node due to security concerns and to avoid interaction through RM Interfaces for each node
in a large cluster. Labels will be fetched from this interface during NM registration and
periodical refresh is also supported.
+
+Configuration
+-------------
 
-# Configuration
-## Setting up ```ResourceManager``` to enable ```Node Labels```:
+###Setting up ResourceManager to enable Node Labels
 
 Setup following properties in ```yarn-site.xml```
 
@@ -49,23 +72,50 @@ Property  | Value
 --- | ----
 yarn.node-labels.fs-store.root-dir  | hdfs://namenode:port/path/to/store/node-labels/
 yarn.node-labels.enabled | true
+yarn.node-labels.configuration-type | Set configuration type for node labels. Administrators
can specify “centralized”, “delegated-centralized” or “distributed”. Default value
is “centralized”.
 
 Notes:
 
 * Make sure ```yarn.node-labels.fs-store.root-dir``` is created and ```ResourceManager```
has permission to access it. (Typically from “yarn” user)
 * If user want to store node label to local file system of RM (instead of HDFS), paths like
`file:///home/yarn/node-label` can be used
 
-### Add/modify node labels list and node-to-labels mapping to YARN
+###Add/modify node labels list to YARN
+
 * Add cluster node labels list:
     * Executing ```yarn rmadmin -addToClusterNodeLabels "label_1(exclusive=true/false),label_2(exclusive=true/false)"```
to add node label.
-    * If user don’t specify “(exclusive=…)”, execlusive will be ```true``` by default.
+    * If user don’t specify “(exclusive=…)”, exclusive will be ```true``` by default.
     * Run ```yarn cluster --list-node-labels``` to check added node labels are visible in
the cluster.
 
-* Add labels to nodes
+###Add/modify node-to-labels mapping to YARN
+
+* Configuring nodes to labels mapping in **Centralized** NodeLabel setup
     * Executing ```yarn rmadmin -replaceLabelsOnNode “node1[:port]=label1 node2=label2”```.
Added label1 to node1, label2 to node2. If user don’t specify port, it added the label to
all ```NodeManagers``` running on the node.
 
-## Configuration of Schedulers for node labels
-### Capacity Scheduler Configuration
+* Configuring nodes to labels mapping in **Distributed** NodeLabel setup
+
+Property  | Value
+----- | ------
+yarn.node-labels.configuration-type | Needs to be set as *"distributed"* in RM, to fetch
node to labels mapping from a configured Node Labels Provider in NM.
+yarn.nodemanager.node-labels.provider | When *"yarn.node-labels.configuration-type"* is configured
with *"distributed"* in RM, Administrators can configure the provider for the node labels
by configuring this parameter in NM. Administrators can configure *"config"*, *"script"* or
the *class name* of the provider. Configured  class needs to extend *org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider*.
If *"config"* is configured, then *"ConfigurationNodeLabelsProvider"* and if *"script"* is
configured, then *"ScriptNodeLabelsProvider"* will be used.
+yarn.nodemanager.node-labels.resync-interval-ms | Interval at which NM syncs its node labels
with RM. NM will send its loaded labels every x intervals configured, along with heartbeat
to RM. This resync is required even when the labels are not modified because admin might have
removed the cluster label which was provided by NM. Default is 2 mins.
+yarn.nodemanager.node-labels.provider.fetch-interval-ms | When *"yarn.nodemanager.node-labels.provider"*
is configured with *"config"*, *"script"* or the *configured class* extends AbstractNodeLabelsProvider,
then periodically node labels are retrieved from the node labels provider. This configuration
is to define the interval period. If -1 is configured, then node labels are retrieved from
provider only during initialization. Defaults to 10 mins.
+yarn.nodemanager.node-labels.provider.fetch-timeout-ms | When *"yarn.nodemanager.node-labels.provider"*
is configured with *"script"*, then this configuration provides the timeout period after which
it will interrupt the script which queries the node labels. Defaults to 20 mins.
+yarn.nodemanager.node-labels.provider.script.path | The node label script to run. Script
output Line starting with *"NODE_PARTITION:"* will be considered as node label Partition.
In case multiple lines of script output have this pattern, then the last one will be considered.
+yarn.nodemanager.node-labels.provider.script.opts | The arguments to pass to the node label
script.
+yarn.nodemanager.node-labels.provider.configured-node-partition | When *"yarn.nodemanager.node-labels.provider"*
is configured with *"config"*, then ConfigurationNodeLabelsProvider fetches the partition
label from this parameter.
+
+* Configuring nodes to labels mapping in **Delegated-Centralized** NodeLabel setup
+
+Property  | Value
+----- | ------
+yarn.node-labels.configuration-type | Needs to be set as *"delegated-centralized"* to fetch
node to labels mapping from a configured Node Labels Provider in RM.
+yarn.resourcemanager.node-labels.provider | When *"yarn.node-labels.configuration-type"*
is configured with *"delegated-centralized"*, then administrators should configure the class
for fetching node labels by ResourceManager. Configured class needs to extend *org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsMappingProvider*.
+yarn.resourcemanager.node-labels.provider.fetch-interval-ms | When *"yarn.node-labels.configuration-type"*
is configured with *"delegated-centralized"*, then periodically node labels are retrieved
from the node labels provider. This configuration is to define the interval. If -1 is configured,
then node labels are retrieved from provider only once for each node after it registers. Defaults
to 30 mins.
+
+###Configuration of Schedulers for node labels
+
+* Capacity Scheduler Configuration
+
 Property  | Value
 ----- | ------
 yarn.scheduler.capacity.`<queue-path>`.capacity | Set the percentage of the queue can
access to nodes belong to DEFAULT partition. The sum of DEFAULT capacities for direct children
under each parent, must be equal to 100.
@@ -114,27 +164,33 @@ Notes:
 * After finishing configuration of CapacityScheduler, execute ```yarn rmadmin -refreshQueues```
to apply changes
 * Go to scheduler page of RM Web UI to check if you have successfully set configuration.
 
-# Specifying node label for application
+Specifying node label for application
+-------------------------------------
+
 Applications can use following Java APIs to specify node label to request
 
 * `ApplicationSubmissionContext.setNodeLabelExpression(..)` to set node label expression
for all containers of the application.
 * `ResourceRequest.setNodeLabelExpression(..)` to set node label expression for individual
resource requests. This can overwrite node label expression set in ApplicationSubmissionContext
 * Specify `setAMContainerResourceRequest.setNodeLabelExpression` in `ApplicationSubmissionContext`
to indicate expected node label for application master container.
 
-# Monitoring
+Monitoring
+----------
+
+###Monitoring through web UI
 
-## Monitoring through web UI
 Following label-related fields can be seen on web UI:
 
 * Nodes page: http://RM-Address:port/cluster/nodes, you can get labels on each node
 * Node labels page: http://RM-Address:port/cluster/nodelabels, you can get type (exclusive/non-exclusive),
number of active node managers, total resource of each partition
 * Scheduler page: http://RM-Address:port/cluster/scheduler, you can get label-related settings
of each queue, and resource usage of queue partitions.
 
-## Monitoring through commandline
+###Monitoring through commandline
 
 * Use `yarn cluster --list-node-labels` to get labels in the cluster
 * Use `yarn node -status <NodeId>` to get node status including labels on a given node
 
-# Useful links
+Useful links
+------------
+
 * [YARN Capacity Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html),
if you need more understanding about how to configure Capacity Scheduler
 * Write YARN application using node labels, you can see following two links as examples:
[YARN distributed shell](https://issues.apache.org/jira/browse/YARN-2502), [Hadoop MapReduce](https://issues.apache.org/jira/browse/MAPREDUCE-6304)


Mime
View raw message