hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zjs...@apache.org
Subject [29/50] [abbrv] hadoop git commit: HADOOP-11495. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)
Date Wed, 11 Feb 2015 19:48:47 GMT
HADOOP-11495. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/e9d26fe9
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/e9d26fe9
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/e9d26fe9

Branch: refs/heads/YARN-2928
Commit: e9d26fe9eb16a0482d3581504ecad22b4cd65077
Parents: 6338ce3
Author: Allen Wittenauer <aw@apache.org>
Authored: Tue Feb 10 13:39:57 2015 -0800
Committer: Allen Wittenauer <aw@apache.org>
Committed: Tue Feb 10 13:39:57 2015 -0800

----------------------------------------------------------------------
 hadoop-common-project/hadoop-common/CHANGES.txt |   3 +
 .../src/site/apt/CLIMiniCluster.apt.vm          |  83 --
 .../src/site/apt/ClusterSetup.apt.vm            | 651 --------------
 .../src/site/apt/CommandsManual.apt.vm          | 327 -------
 .../src/site/apt/Compatibility.apt.vm           | 541 ------------
 .../src/site/apt/DeprecatedProperties.apt.vm    | 552 ------------
 .../src/site/apt/FileSystemShell.apt.vm         | 764 ----------------
 .../src/site/apt/HttpAuthentication.apt.vm      |  98 ---
 .../src/site/apt/InterfaceClassification.apt.vm | 239 -----
 .../hadoop-common/src/site/apt/Metrics.apt.vm   | 879 -------------------
 .../src/site/apt/NativeLibraries.apt.vm         | 205 -----
 .../src/site/apt/RackAwareness.apt.vm           | 140 ---
 .../src/site/apt/SecureMode.apt.vm              | 689 ---------------
 .../src/site/apt/ServiceLevelAuth.apt.vm        | 216 -----
 .../src/site/apt/SingleCluster.apt.vm           | 286 ------
 .../src/site/apt/SingleNodeSetup.apt.vm         |  24 -
 .../src/site/apt/Superusers.apt.vm              | 144 ---
 .../hadoop-common/src/site/apt/Tracing.apt.vm   | 233 -----
 .../src/site/markdown/CLIMiniCluster.md.vm      |  68 ++
 .../src/site/markdown/ClusterSetup.md           | 339 +++++++
 .../src/site/markdown/CommandsManual.md         | 227 +++++
 .../src/site/markdown/Compatibility.md          | 313 +++++++
 .../src/site/markdown/DeprecatedProperties.md   | 288 ++++++
 .../src/site/markdown/FileSystemShell.md        | 689 +++++++++++++++
 .../src/site/markdown/HttpAuthentication.md     |  58 ++
 .../site/markdown/InterfaceClassification.md    | 105 +++
 .../hadoop-common/src/site/markdown/Metrics.md  | 456 ++++++++++
 .../src/site/markdown/NativeLibraries.md.vm     | 145 +++
 .../src/site/markdown/RackAwareness.md          | 104 +++
 .../src/site/markdown/SecureMode.md             | 375 ++++++++
 .../src/site/markdown/ServiceLevelAuth.md       | 144 +++
 .../src/site/markdown/SingleCluster.md.vm       | 232 +++++
 .../src/site/markdown/SingleNodeSetup.md        |  20 +
 .../src/site/markdown/Superusers.md             | 106 +++
 .../hadoop-common/src/site/markdown/Tracing.md  | 182 ++++
 35 files changed, 3854 insertions(+), 6071 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt
index fadc744..1ba93e8 100644
--- a/hadoop-common-project/hadoop-common/CHANGES.txt
+++ b/hadoop-common-project/hadoop-common/CHANGES.txt
@@ -168,6 +168,9 @@ Trunk (Unreleased)
     HADOOP-6964. Allow compact property description in xml (Kengo Seki
     via aw)
 
+    HADOOP-11495. Convert site documentation from apt to markdown
+    (Masatake Iwasaki via aw)
+
   BUG FIXES
 
     HADOOP-11473. test-patch says "-1 overall" even when all checks are +1

http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm
deleted file mode 100644
index 2d12c39..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm
+++ /dev/null
@@ -1,83 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop MapReduce Next Generation ${project.version} - CLI MiniCluster.
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop MapReduce Next Generation - CLI MiniCluster.
-
-%{toc|section=1|fromDepth=0}
-
-* {Purpose}
-
-  Using the CLI MiniCluster, users can simply start and stop a single-node
-  Hadoop cluster with a single command, and without the need to set any
-  environment variables or manage configuration files. The CLI MiniCluster
-  starts both a <<<YARN>>>/<<<MapReduce>>> & <<<HDFS>>> clusters.
-
-  This is useful for cases where users want to quickly experiment with a real
-  Hadoop cluster or test non-Java programs that rely on significant Hadoop
-  functionality.
-
-* {Hadoop Tarball}
-
-  You should be able to obtain the Hadoop tarball from the release. Also, you
-  can directly create a tarball from the source:
-
-+---+
-$ mvn clean install -DskipTests
-$ mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip
-+---+
-  <<NOTE:>> You will need {{{http://code.google.com/p/protobuf/}protoc 2.5.0}}
-            installed.
-
-  The tarball should be available in <<<hadoop-dist/target/>>> directory. 
-
-* {Running the MiniCluster}
-
-  From inside the root directory of the extracted tarball, you can start the CLI
-  MiniCluster using the following command:
-
-+---+
-$ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${project.version}-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT
-+---+
-
-  In the example command above, <<<RM_PORT>>> and <<<JHS_PORT>>> should be
-  replaced by the user's choice of these port numbers. If not specified, random
-  free ports will be used.
-
-  There are a number of command line arguments that the users can use to control
-  which services to start, and to pass other configuration properties.
-  The available command line arguments:
-
-+---+
-$ -D <property=value>    Options to pass into configuration object
-$ -datanodes <arg>       How many datanodes to start (default 1)
-$ -format                Format the DFS (default false)
-$ -help                  Prints option help.
-$ -jhsport <arg>         JobHistoryServer port (default 0--we choose)
-$ -namenode <arg>        URL of the namenode (default is either the DFS
-$                        cluster or a temporary dir)
-$ -nnport <arg>          NameNode port (default 0--we choose)
-$ -nodemanagers <arg>    How many nodemanagers to start (default 1)
-$ -nodfs                 Don't start a mini DFS cluster
-$ -nomr                  Don't start a mini MR cluster
-$ -rmport <arg>          ResourceManager port (default 0--we choose)
-$ -writeConfig <path>    Save configuration to this XML file.
-$ -writeDetails <path>   Write basic information to this JSON file.
-+---+
-
-  To display this full list of available arguments, the user can pass the
-  <<<-help>>> argument to the above command.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
deleted file mode 100644
index 52b0552..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
+++ /dev/null
@@ -1,651 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop ${project.version} - Cluster Setup
-  ---
-  ---
-  ${maven.build.timestamp}
-
-%{toc|section=1|fromDepth=0}
-
-Hadoop Cluster Setup
-
-* {Purpose}
-
-  This document describes how to install and configure
-  Hadoop clusters ranging from a few nodes to extremely large clusters
-  with thousands of nodes.  To play with Hadoop, you may first want to
-  install it on a single machine (see {{{./SingleCluster.html}Single Node Setup}}).
-
-  This document does not cover advanced topics such as {{{./SecureMode.html}Security}} or
-  High Availability.
-
-* {Prerequisites}
-
-  * Install Java. See the {{{http://wiki.apache.org/hadoop/HadoopJavaVersions}Hadoop Wiki}} for known good versions.
-  * Download a stable version of Hadoop from Apache mirrors.
-
-* {Installation}
-
-  Installing a Hadoop cluster typically involves unpacking the software on all
-  the machines in the cluster or installing it via a packaging system as
-  appropriate for your operating system.  It is important to divide up the hardware
-  into functions.
-
-  Typically one machine in the cluster is designated as the NameNode and
-  another machine the as ResourceManager, exclusively. These are the masters. Other
-  services (such as Web App Proxy Server and MapReduce Job History server) are usually
-  run either on dedicated hardware or on shared infrastrucutre, depending upon the load.
-
-  The rest of the machines in the cluster act as both DataNode and NodeManager.
-  These are the slaves.
-
-* {Configuring Hadoop in Non-Secure Mode}
-
-    Hadoop's Java configuration is driven by two types of important configuration files:
-
-      * Read-only default configuration - <<<core-default.xml>>>,
-        <<<hdfs-default.xml>>>, <<<yarn-default.xml>>> and
-        <<<mapred-default.xml>>>.
-
-      * Site-specific configuration - <<<etc/hadoop/core-site.xml>>>,
-        <<<etc/hadoop/hdfs-site.xml>>>, <<<etc/hadoop/yarn-site.xml>>> and
-        <<<etc/hadoop/mapred-site.xml>>>.
-
-
-  Additionally, you can control the Hadoop scripts found in the bin/
-  directory of the distribution, by setting site-specific values via the
-  <<<etc/hadoop/hadoop-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>>.
-
-  To configure the Hadoop cluster you will need to configure the
-  <<<environment>>> in which the Hadoop daemons execute as well as the
-  <<<configuration parameters>>> for the Hadoop daemons.
-
-  HDFS daemons are NameNode, SecondaryNameNode, and DataNode.  YARN damones
-  are ResourceManager, NodeManager, and WebAppProxy.  If MapReduce is to be
-  used, then the MapReduce Job History Server will also be running.  For
-  large installations, these are generally running on separate hosts.
-
-
-** {Configuring Environment of Hadoop Daemons}
-
-  Administrators should use the <<<etc/hadoop/hadoop-env.sh>>> and optionally the
-  <<<etc/hadoop/mapred-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>> scripts to do
-  site-specific customization of the Hadoop daemons' process environment.
-
-  At the very least, you must specify the <<<JAVA_HOME>>> so that it is
-  correctly defined on each remote node.
-
-  Administrators can configure individual daemons using the configuration
-  options shown below in the table:
-
-*--------------------------------------+--------------------------------------+
-|| Daemon                              || Environment Variable                |
-*--------------------------------------+--------------------------------------+
-| NameNode                             | HADOOP_NAMENODE_OPTS                 |
-*--------------------------------------+--------------------------------------+
-| DataNode                             | HADOOP_DATANODE_OPTS                 |
-*--------------------------------------+--------------------------------------+
-| Secondary NameNode                   | HADOOP_SECONDARYNAMENODE_OPTS        |
-*--------------------------------------+--------------------------------------+
-| ResourceManager                      | YARN_RESOURCEMANAGER_OPTS            |
-*--------------------------------------+--------------------------------------+
-| NodeManager                          | YARN_NODEMANAGER_OPTS                |
-*--------------------------------------+--------------------------------------+
-| WebAppProxy                          | YARN_PROXYSERVER_OPTS                |
-*--------------------------------------+--------------------------------------+
-| Map Reduce Job History Server        | HADOOP_JOB_HISTORYSERVER_OPTS        |
-*--------------------------------------+--------------------------------------+
-
-
-  For example, To configure Namenode to use parallelGC, the following
-  statement should be added in hadoop-env.sh :
-
-----
-  export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
-----
-
-  See <<<etc/hadoop/hadoop-env.sh>>> for other examples.
-
-  Other useful configuration parameters that you can customize include:
-
-    * <<<HADOOP_PID_DIR>>> - The directory where the
-      daemons' process id files are stored.
-
-    * <<<HADOOP_LOG_DIR>>> - The directory where the
-      daemons' log files are stored. Log files are automatically created
-      if they don't exist.
-
-    * <<<HADOOP_HEAPSIZE_MAX>>> - The maximum amount of
-      memory to use for the Java heapsize.  Units supported by the JVM
-      are also supported here.  If no unit is present, it will be assumed
-      the number is in megabytes. By default, Hadoop will let the JVM
-      determine how much to use. This value can be overriden on
-      a per-daemon basis using the appropriate <<<_OPTS>>> variable listed above.
-      For example, setting <<<HADOOP_HEAPSIZE_MAX=1g>>> and
-      <<<HADOOP_NAMENODE_OPTS="-Xmx5g">>>  will configure the NameNode with 5GB heap.
-
-  In most cases, you should specify the <<<HADOOP_PID_DIR>>> and
-  <<<HADOOP_LOG_DIR>>> directories such that they can only be
-  written to by the users that are going to run the hadoop daemons.
-  Otherwise there is the potential for a symlink attack.
-
-  It is also traditional to configure <<<HADOOP_PREFIX>>> in the system-wide
-  shell environment configuration.  For example, a simple script inside
-  <<</etc/profile.d>>>:
-
----
-  HADOOP_PREFIX=/path/to/hadoop
-  export HADOOP_PREFIX
----
-
-*--------------------------------------+--------------------------------------+
-|| Daemon                              || Environment Variable                |
-*--------------------------------------+--------------------------------------+
-| ResourceManager                      | YARN_RESOURCEMANAGER_HEAPSIZE        |
-*--------------------------------------+--------------------------------------+
-| NodeManager                          | YARN_NODEMANAGER_HEAPSIZE            |
-*--------------------------------------+--------------------------------------+
-| WebAppProxy                          | YARN_PROXYSERVER_HEAPSIZE            |
-*--------------------------------------+--------------------------------------+
-| Map Reduce Job History Server        | HADOOP_JOB_HISTORYSERVER_HEAPSIZE    |
-*--------------------------------------+--------------------------------------+
-
-** {Configuring the Hadoop Daemons}
-
-    This section deals with important parameters to be specified in
-    the given configuration files:
-
-    * <<<etc/hadoop/core-site.xml>>>
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<fs.defaultFS>>>      | NameNode URI            | <hdfs://host:port/>    |
-*-------------------------+-------------------------+------------------------+
-| <<<io.file.buffer.size>>> | 131072 |  |
-| | | Size of read/write buffer used in SequenceFiles. |
-*-------------------------+-------------------------+------------------------+
-
-    * <<<etc/hadoop/hdfs-site.xml>>>
-
-      * Configurations for NameNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.name.dir>>> | | |
-| | Path on the local filesystem where the NameNode stores the namespace | |
-| | and transactions logs persistently. | |
-| | | If this is a comma-delimited list of directories then the name table is  |
-| | | replicated in all of the directories, for redundancy. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.hosts>>> / <<<dfs.namenode.hosts.exclude>>> | | |
-| | List of permitted/excluded DataNodes. | |
-| | | If necessary, use these files to control the list of allowable |
-| | | datanodes. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.blocksize>>> | 268435456 | |
-| | | HDFS blocksize of 256MB for large file-systems. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.handler.count>>> | 100 | |
-| | | More NameNode server threads to handle RPCs from large number of |
-| | | DataNodes. |
-*-------------------------+-------------------------+------------------------+
-
-      * Configurations for DataNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.data.dir>>> | | |
-| | Comma separated list of paths on the local filesystem of a | |
-| | <<<DataNode>>> where it should store its blocks. | |
-| | | If this is a comma-delimited list of directories, then data will be |
-| | | stored in all named directories, typically on different devices. |
-*-------------------------+-------------------------+------------------------+
-
-    * <<<etc/hadoop/yarn-site.xml>>>
-
-      * Configurations for ResourceManager and NodeManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.acl.enable>>> | | |
-| | <<<true>>> / <<<false>>> | |
-| | | Enable ACLs? Defaults to <false>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.admin.acl>>> | | |
-| | Admin ACL | |
-| | | ACL to set admins on the cluster. |
-| | | ACLs are of for <comma-separated-users><space><comma-separated-groups>. |
-| | | Defaults to special value of <<*>> which means <anyone>. |
-| | | Special value of just <space> means no one has access. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.log-aggregation-enable>>> | | |
-| | <false> | |
-| | | Configuration to enable or disable log aggregation |
-*-------------------------+-------------------------+------------------------+
-
-
-      * Configurations for ResourceManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.address>>> | | |
-| | <<<ResourceManager>>> host:port for clients to submit jobs. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.scheduler.address>>> | | |
-| | <<<ResourceManager>>> host:port for ApplicationMasters to talk to | |
-| | Scheduler to obtain resources. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.resource-tracker.address>>> | | |
-| | <<<ResourceManager>>> host:port for NodeManagers. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.admin.address>>> | | |
-| | <<<ResourceManager>>> host:port for administrative commands. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.webapp.address>>> | | |
-| | <<<ResourceManager>>> web-ui host:port. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.hostname>>> | | |
-| | <<<ResourceManager>>> host. | |
-| | | <host>\ |
-| | | Single hostname that can be set in place of setting all <<<yarn.resourcemanager*address>>> resources.  Results in default ports for ResourceManager components. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.scheduler.class>>> | | |
-| | <<<ResourceManager>>> Scheduler class. | |
-| | | <<<CapacityScheduler>>> (recommended), <<<FairScheduler>>> (also recommended), or <<<FifoScheduler>>> |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.scheduler.minimum-allocation-mb>>> | | |
-| | Minimum limit of memory to allocate to each container request at the <<<Resource Manager>>>. | |
-| | | In MBs |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.scheduler.maximum-allocation-mb>>> | | |
-| | Maximum limit of memory to allocate to each container request at the <<<Resource Manager>>>. | |
-| | | In MBs |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.nodes.include-path>>> / | | |
-| <<<yarn.resourcemanager.nodes.exclude-path>>> | | |
-| | List of permitted/excluded NodeManagers. | |
-| | | If necessary, use these files to control the list of allowable |
-| | | NodeManagers. |
-*-------------------------+-------------------------+------------------------+
-
-      * Configurations for NodeManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.resource.memory-mb>>> | | |
-| | Resource i.e. available physical memory, in MB, for given <<<NodeManager>>> | |
-| | | Defines total available resources on the <<<NodeManager>>> to be made |
-| | | available to running containers |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.vmem-pmem-ratio>>> | | |
-| | Maximum ratio by which virtual memory usage of tasks may exceed |
-| | physical memory | |
-| | | The virtual memory usage of each task may exceed its physical memory |
-| | | limit by this ratio. The total amount of virtual memory used by tasks |
-| | | on the NodeManager may exceed its physical memory usage by this ratio. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.local-dirs>>> | | |
-| | Comma-separated list of paths on the local filesystem where | |
-| | intermediate data is written. ||
-| | | Multiple paths help spread disk i/o. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.log-dirs>>> | | |
-| | Comma-separated list of paths on the local filesystem where logs  | |
-| | are written. | |
-| | | Multiple paths help spread disk i/o. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.log.retain-seconds>>> | | |
-| | <10800> | |
-| | | Default time (in seconds) to retain log files on the NodeManager |
-| | | Only applicable if log-aggregation is disabled. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.remote-app-log-dir>>> | | |
-| | </logs> | |
-| | | HDFS directory where the application logs are moved on application |
-| | | completion. Need to set appropriate permissions. |
-| | | Only applicable if log-aggregation is enabled. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.remote-app-log-dir-suffix>>> | | |
-| | <logs> | |
-| | | Suffix appended to the remote log dir. Logs will be aggregated to  |
-| | | $\{yarn.nodemanager.remote-app-log-dir\}/$\{user\}/$\{thisParam\} |
-| | | Only applicable if log-aggregation is enabled. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.aux-services>>> | | |
-| | mapreduce_shuffle  | |
-| | | Shuffle service that needs to be set for Map Reduce applications. |
-*-------------------------+-------------------------+------------------------+
-
-      * Configurations for History Server (Needs to be moved elsewhere):
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.log-aggregation.retain-seconds>>> | | |
-| | <-1> | |
-| | | How long to keep aggregation logs before deleting them. -1 disables. |
-| | | Be careful, set this too small and you will spam the name node. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.log-aggregation.retain-check-interval-seconds>>> | | |
-| | <-1> | |
-| | | Time between checks for aggregated log retention. If set to 0 or a |
-| | | negative value then the value is computed as one-tenth of the |
-| | | aggregated log retention time. |
-| | | Be careful, set this too small and you will spam the name node. |
-*-------------------------+-------------------------+------------------------+
-
-    * <<<etc/hadoop/mapred-site.xml>>>
-
-      * Configurations for MapReduce Applications:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.framework.name>>> | | |
-| | yarn | |
-| | | Execution framework set to Hadoop YARN. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.map.memory.mb>>> | 1536 | |
-| | | Larger resource limit for maps. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.map.java.opts>>> | -Xmx1024M | |
-| | | Larger heap-size for child jvms of maps. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.reduce.memory.mb>>> | 3072 | |
-| | | Larger resource limit for reduces. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.reduce.java.opts>>> | -Xmx2560M | |
-| | | Larger heap-size for child jvms of reduces. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.task.io.sort.mb>>> | 512 | |
-| | | Higher memory-limit while sorting data for efficiency. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.task.io.sort.factor>>> | 100 | |
-| | | More streams merged at once while sorting files. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.reduce.shuffle.parallelcopies>>> | 50 | |
-| | | Higher number of parallel copies run by reduces to fetch outputs |
-| | | from very large number of maps. |
-*-------------------------+-------------------------+------------------------+
-
-      * Configurations for MapReduce JobHistory Server:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.address>>> | | |
-| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.webapp.address>>> | | |
-| | MapReduce JobHistory Server Web UI <host:port> | Default port is 19888. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.intermediate-done-dir>>> | /mr-history/tmp | |
-|  | | Directory where history files are written by MapReduce jobs. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.done-dir>>> | /mr-history/done| |
-| | | Directory where history files are managed by the MR JobHistory Server. |
-*-------------------------+-------------------------+------------------------+
-
-* {Monitoring Health of NodeManagers}
-
-    Hadoop provides a mechanism by which administrators can configure the
-    NodeManager to run an administrator supplied script periodically to
-    determine if a node is healthy or not.
-
-    Administrators can determine if the node is in a healthy state by
-    performing any checks of their choice in the script. If the script
-    detects the node to be in an unhealthy state, it must print a line to
-    standard output beginning with the string ERROR. The NodeManager spawns
-    the script periodically and checks its output. If the script's output
-    contains the string ERROR, as described above, the node's status is
-    reported as <<<unhealthy>>> and the node is black-listed by the
-    ResourceManager. No further tasks will be assigned to this node.
-    However, the NodeManager continues to run the script, so that if the
-    node becomes healthy again, it will be removed from the blacklisted nodes
-    on the ResourceManager automatically. The node's health along with the
-    output of the script, if it is unhealthy, is available to the
-    administrator in the ResourceManager web interface. The time since the
-    node was healthy is also displayed on the web interface.
-
-    The following parameters can be used to control the node health
-    monitoring script in <<<etc/hadoop/yarn-site.xml>>>.
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.path>>> | | |
-| | Node health script  | |
-| | | Script to check for node's health status. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.opts>>> | | |
-| | Node health script options  | |
-| | | Options for script to check for node's health status. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.interval-ms>>> | | |
-| | Node health script interval  | |
-| | | Time interval for running health script. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.timeout-ms>>> | | |
-| | Node health script timeout interval  | |
-| | | Timeout for health script execution. |
-*-------------------------+-------------------------+------------------------+
-
-  The health checker script is not supposed to give ERROR if only some of the
-  local disks become bad. NodeManager has the ability to periodically check
-  the health of the local disks (specifically checks nodemanager-local-dirs
-  and nodemanager-log-dirs) and after reaching the threshold of number of
-  bad directories based on the value set for the config property
-  yarn.nodemanager.disk-health-checker.min-healthy-disks, the whole node is
-  marked unhealthy and this info is sent to resource manager also. The boot
-  disk is either raided or a failure in the boot disk is identified by the
-  health checker script.
-
-* {Slaves File}
-
-  List all slave hostnames or IP addresses in your <<<etc/hadoop/slaves>>>
-  file, one per line.  Helper scripts (described below) will use the
-  <<<etc/hadoop/slaves>>> file to run commands on many hosts at once.  It is not
-  used for any of the Java-based Hadoop configuration.  In order
-  to use this functionality, ssh trusts (via either passphraseless ssh or
-  some other means, such as Kerberos) must be established for the accounts
-  used to run Hadoop.
-
-* {Hadoop Rack Awareness}
-
-  Many Hadoop components are rack-aware and take advantage of the
-  network topology for performance and safety. Hadoop daemons obtain the
-  rack information of the slaves in the cluster by invoking an administrator
-  configured module.  See the {{{./RackAwareness.html}Rack Awareness}}
-  documentation for more specific information.
-
-  It is highly recommended configuring rack awareness prior to starting HDFS.
-
-* {Logging}
-
-  Hadoop uses the {{{http://logging.apache.org/log4j/2.x/}Apache log4j}} via the Apache Commons Logging framework for
-  logging. Edit the <<<etc/hadoop/log4j.properties>>> file to customize the
-  Hadoop daemons' logging configuration (log-formats and so on).
-
-* {Operating the Hadoop Cluster}
-
-  Once all the necessary configuration is complete, distribute the files to the
-  <<<HADOOP_CONF_DIR>>> directory on all the machines.  This should be the
-  same directory on all machines.
-
-  In general, it is recommended that HDFS and YARN run as separate users.
-  In the majority of installations, HDFS processes execute as 'hdfs'.  YARN
-  is typically using the 'yarn' account.
-
-** Hadoop Startup
-
-    To start a Hadoop cluster you will need to start both the HDFS and YARN
-    cluster.
-
-    The first time you bring up HDFS, it must be formatted.  Format a new
-    distributed filesystem as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
-----
-
-    Start the HDFS NameNode with the following command on the
-    designated node as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode
-----
-
-    Start a HDFS DataNode with the following command on each
-    designated node as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start datanode
-----
-
-    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
-    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
-    HDFS processes can be started with a utility script.  As <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
-----
-
-    Start the YARN with the following command, run on the designated
-    ResourceManager as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start resourcemanager
-----
-
-    Run a script to start a NodeManager on each designated host as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start nodemanager
-----
-
-    Start a standalone WebAppProxy server. Run on the WebAppProxy
-    server as <yarn>.  If multiple servers are used with load balancing
-    it should be run on each of them:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start proxyserver
-----
-
-    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
-    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
-    YARN processes can be started with a utility script.  As <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
-----
-
-    Start the MapReduce JobHistory Server with the following command, run
-    on the designated server as <mapred>:
-
-----
-[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon start historyserver
-----
-
-** Hadoop Shutdown
-
-  Stop the NameNode with the following command, run on the designated NameNode
-  as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop namenode
-----
-
-  Run a script to stop a DataNode as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop datanode
-----
-
-    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
-    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
-    HDFS processes may be stopped with a utility script.  As <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
-----
-
-  Stop the ResourceManager with the following command, run on the designated
-  ResourceManager as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop resourcemanager
-----
-
-  Run a script to stop a NodeManager on a slave as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop nodemanager
-----
-
-    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
-    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
-    YARN processes can be stopped with a utility script.  As <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
-----
-
-  Stop the WebAppProxy server. Run on the WebAppProxy  server as
-  <yarn>.  If multiple servers are used with load balancing it
-  should be run on each of them:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn stop proxyserver
-----
-
-  Stop the MapReduce JobHistory Server with the following command, run on the
-  designated server as <mapred>:
-
-----
-[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon stop historyserver
-----
-
-* {Web Interfaces}
-
-  Once the Hadoop cluster is up and running check the web-ui of the
-  components as described below:
-
-*-------------------------+-------------------------+------------------------+
-|| Daemon                 || Web Interface          || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| NameNode | http://<nn_host:port>/ | Default HTTP port is 50070. |
-*-------------------------+-------------------------+------------------------+
-| ResourceManager | http://<rm_host:port>/ | Default HTTP port is 8088. |
-*-------------------------+-------------------------+------------------------+
-| MapReduce JobHistory Server | http://<jhs_host:port>/ | |
-| | | Default HTTP port is 19888. |
-*-------------------------+-------------------------+------------------------+
-
-

http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
deleted file mode 100644
index 67c8bc3..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
+++ /dev/null
@@ -1,327 +0,0 @@
-~~ Licensed to the Apache Software Foundation (ASF) under one or more
-~~ contributor license agreements.  See the NOTICE file distributed with
-~~ this work for additional information regarding copyright ownership.
-~~ The ASF licenses this file to You under the Apache License, Version 2.0
-~~ (the "License"); you may not use this file except in compliance with
-~~ the License.  You may obtain a copy of the License at
-~~
-~~     http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License.
-
-  ---
-  Hadoop Commands Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-%{toc}
-
-Hadoop Commands Guide
-
-* Overview
-
-   All of the Hadoop commands and subprojects follow the same basic structure:
-
-   Usage: <<<shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
-
-*--------+---------+
-|| FIELD || Description 
-*-----------------------+---------------+
-| shellcommand | The command of the project being invoked.  For example,
-               | Hadoop common uses <<<hadoop>>>, HDFS uses <<<hdfs>>>, 
-               | and YARN uses <<<yarn>>>.
-*---------------+-------------------+
-| SHELL_OPTIONS | Options that the shell processes prior to executing Java.
-*-----------------------+---------------+
-| COMMAND | Action to perform.
-*-----------------------+---------------+
-| GENERIC_OPTIONS       | The common set of options supported by 
-                        | multiple commands.
-*-----------------------+---------------+
-| COMMAND_OPTIONS       | Various commands with their options are 
-                        | described in this documention for the 
-                        | Hadoop common sub-project.  HDFS and YARN are
-                        | covered in other documents.
-*-----------------------+---------------+
-
-** {Shell Options}
-
-   All of the shell commands will accept a common set of options.  For some commands,
-   these options are ignored. For example, passing <<<---hostnames>>> on a
-   command that only executes on a single host will be ignored.
-
-*-----------------------+---------------+
-|| SHELL_OPTION       || Description
-*-----------------------+---------------+
-| <<<--buildpaths>>>    | Enables developer versions of jars.
-*-----------------------+---------------+
-| <<<--config confdir>>> | Overwrites the default Configuration 
-                         | directory.  Default is <<<${HADOOP_PREFIX}/conf>>>.
-*-----------------------+----------------+
-| <<<--daemon mode>>>   | If the command supports daemonization (e.g.,
-                        | <<<hdfs namenode>>>), execute in the appropriate
-                        | mode. Supported modes are <<<start>>> to start the
-                        | process in daemon mode, <<<stop>>> to stop the
-                        | process, and <<<status>>> to determine the active
-                        | status of the process.  <<<status>>> will return
-                        | an {{{http://refspecs.linuxbase.org/LSB_3.0.0/LSB-generic/LSB-generic/iniscrptact.html}LSB-compliant}} result code. 
-                        | If no option is provided, commands that support
-                        | daemonization will run in the foreground.   
-*-----------------------+---------------+
-| <<<--debug>>>         | Enables shell level configuration debugging information
-*-----------------------+---------------+
-| <<<--help>>>          | Shell script usage information.
-*-----------------------+---------------+
-| <<<--hostnames>>> | A space delimited list of hostnames where to execute 
-                    | a multi-host subcommand. By default, the content of
-                    | the <<<slaves>>> file is used.  
-*-----------------------+----------------+
-| <<<--hosts>>> | A file that contains a list of hostnames where to execute
-                | a multi-host subcommand. By default, the content of the
-                | <<<slaves>>> file is used.  
-*-----------------------+----------------+
-| <<<--loglevel loglevel>>> | Overrides the log level. Valid log levels are
-|                           | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
-|                           | Default is INFO.
-*-----------------------+---------------+
-
-** {Generic Options}
-
-   Many subcommands honor a common set of configuration options to alter their behavior:
-
-*------------------------------------------------+-----------------------------+
-||            GENERIC_OPTION                     ||            Description
-*------------------------------------------------+-----------------------------+
-|<<<-archives \<comma separated list of archives\> >>> | Specify comma separated
-                                                 | archives to be unarchived on
-                                                 | the compute machines. Applies
-                                                 | only to job.
-*------------------------------------------------+-----------------------------+
-|<<<-conf \<configuration file\> >>>             | Specify an application
-                                                 | configuration file.
-*------------------------------------------------+-----------------------------+
-|<<<-D \<property\>=\<value\> >>>                | Use value for given property.
-*------------------------------------------------+-----------------------------+
-|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
-                                                 | to be copied to the map
-                                                 | reduce cluster.  Applies only
-                                                 | to job.
-*------------------------------------------------+-----------------------------+
-|<<<-jt \<local\> or \<resourcemanager:port\>>>> | Specify a ResourceManager.
-                                                 | Applies only to job.
-*------------------------------------------------+-----------------------------+
-|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
-                                                 | files to include in the
-                                                 | classpath. Applies only to
-                                                 | job.
-*------------------------------------------------+-----------------------------+
-
-Hadoop Common Commands
-
-  All of these commands are executed from the <<<hadoop>>> shell command.  They
-  have been broken up into {{User Commands}} and 
-  {{Admininistration Commands}}.
-
-* User Commands
-
-   Commands useful for users of a hadoop cluster.
-
-** <<<archive>>>
-    
-   Creates a hadoop archive. More information can be found at
-  {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html}
-   Hadoop Archives Guide}}.
-
-** <<<checknative>>>
-
-    Usage: <<<hadoop checknative [-a] [-h] >>>
-
-*-----------------+-----------------------------------------------------------+
-|| COMMAND_OPTION || Description
-*-----------------+-----------------------------------------------------------+
-| -a              | Check all libraries are available.
-*-----------------+-----------------------------------------------------------+
-| -h              | print help
-*-----------------+-----------------------------------------------------------+
-
-    This command checks the availability of the Hadoop native code.  See
-    {{{NativeLibraries.html}}} for more information.  By default, this command 
-    only checks the availability of libhadoop.
-
-** <<<classpath>>>
-
-   Usage: <<<hadoop classpath [--glob|--jar <path>|-h|--help]>>>
-
-*-----------------+-----------------------------------------------------------+
-|| COMMAND_OPTION || Description
-*-----------------+-----------------------------------------------------------+
-| --glob          | expand wildcards
-*-----------------+-----------------------------------------------------------+
-| --jar <path>    | write classpath as manifest in jar named <path>
-*-----------------+-----------------------------------------------------------+
-| -h, --help      | print help
-*-----------------+-----------------------------------------------------------+
-
-   Prints the class path needed to get the Hadoop jar and the required
-   libraries.  If called without arguments, then prints the classpath set up by
-   the command scripts, which is likely to contain wildcards in the classpath
-   entries.  Additional options print the classpath after wildcard expansion or
-   write the classpath into the manifest of a jar file.  The latter is useful in
-   environments where wildcards cannot be used and the expanded classpath exceeds
-   the maximum supported command line length.
-
-** <<<credential>>>
-
-   Usage: <<<hadoop credential <subcommand> [options]>>>
-
-*-------------------+-------------------------------------------------------+
-||COMMAND_OPTION    ||                   Description
-*-------------------+-------------------------------------------------------+
-| create <alias> [-v <value>][-provider <provider-path>]| Prompts the user for
-                    | a credential to be stored as the given alias when a value
-                    | is not provided via <<<-v>>>. The
-                    | <hadoop.security.credential.provider.path> within the
-                    | core-site.xml file will be used unless a <<<-provider>>> is
-                    | indicated.
-*-------------------+-------------------------------------------------------+
-| delete <alias> [-i][-provider <provider-path>] | Deletes the credential with
-                    | the provided alias and optionally warns the user when
-                    | <<<--interactive>>> is used.
-                    | The <hadoop.security.credential.provider.path> within the
-                    | core-site.xml file will be used unless a <<<-provider>>> is
-                    | indicated.
-*-------------------+-------------------------------------------------------+
-| list [-provider <provider-path>] | Lists all of the credential aliases
-                    | The <hadoop.security.credential.provider.path> within the
-                    | core-site.xml file will be used unless a <<<-provider>>> is
-                    | indicated.
-*-------------------+-------------------------------------------------------+
-
-   Command to manage credentials, passwords and secrets within credential providers.
-
-   The CredentialProvider API in Hadoop allows for the separation of applications
-   and how they store their required passwords/secrets. In order to indicate
-   a particular provider type and location, the user must provide the
-   <hadoop.security.credential.provider.path> configuration element in core-site.xml
-   or use the command line option <<<-provider>>> on each of the following commands.
-   This provider path is a comma-separated list of URLs that indicates the type and
-   location of a list of providers that should be consulted. For example, the following path:
-   <<<user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks>>>
-
-   indicates that the current user's credentials file should be consulted through
-   the User Provider, that the local file located at <<</tmp/test.jceks>>> is a Java Keystore
-   Provider and that the file located within HDFS at <<<nn1.example.com/my/path/test.jceks>>>
-   is also a store for a Java Keystore Provider.
-
-   When utilizing the credential command it will often be for provisioning a password
-   or secret to a particular credential store provider. In order to explicitly
-   indicate which provider store to use the <<<-provider>>> option should be used. Otherwise,
-   given a path of multiple providers, the first non-transient provider will be used.
-   This may or may not be the one that you intended.
-
-   Example: <<<-provider jceks://file/tmp/test.jceks>>>
-
-** <<<distch>>>
-
-  Usage: <<<hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions>>>
-  
-*-------------------+-------------------------------------------------------+
-||COMMAND_OPTION    ||                   Description
-*-------------------+-------------------------------------------------------+
-| -f | List of objects to change
-*----+------------+
-| -i | Ignore failures
-*----+------------+
-| -log | Directory to log output
-*-----+---------+
-
-  Change the ownership and permissions on many files at once.
-
-** <<<distcp>>>
-
-   Copy file or directories recursively. More information can be found at
-   {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistCp.html}
-   Hadoop DistCp Guide}}.
-
-** <<<fs>>>
-
-   This command is documented in the {{{./FileSystemShell.html}File System Shell Guide}}.  It is a synonym for <<<hdfs dfs>>> when HDFS is in use.
-
-** <<<jar>>>
-
-  Usage: <<<hadoop jar <jar> [mainClass] args...>>>
-
-  Runs a jar file. 
-  
-  Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<<yarn jar>>>}}
-  to launch YARN applications instead.
-
-** <<<jnipath>>>
-
-    Usage: <<<hadoop jnipath>>>
-
-    Print the computed java.library.path.
-
-** <<<key>>>
-
-    Manage keys via the KeyProvider.
-
-** <<<trace>>>
-
-    View and modify Hadoop tracing settings.   See the {{{./Tracing.html}Tracing Guide}}.
-
-** <<<version>>>
-
-   Usage: <<<hadoop version>>>
-
-   Prints the version.
-
-** <<<CLASSNAME>>>
-
-   Usage: <<<hadoop CLASSNAME>>>
-
-   Runs the class named <<<CLASSNAME>>>.  The class must be part of a package.
-
-* {Administration Commands}
-
-   Commands useful for administrators of a hadoop cluster.
-
-** <<<daemonlog>>>
-
-   Usage: <<<hadoop daemonlog -getlevel <host:port> <name> >>>
-   Usage: <<<hadoop daemonlog -setlevel <host:port> <name> <level> >>>
-
-*------------------------------+-----------------------------------------------------------+
-|| COMMAND_OPTION              || Description
-*------------------------------+-----------------------------------------------------------+
-| -getlevel <host:port> <name> | Prints the log level of the daemon running at
-                               | <host:port>. This command internally connects
-                               | to http://<host:port>/logLevel?log=<name>
-*------------------------------+-----------------------------------------------------------+
-|   -setlevel <host:port> <name> <level> | Sets the log level of the daemon
-                               | running at <host:port>. This command internally
-                               | connects to http://<host:port>/logLevel?log=<name>
-*------------------------------+-----------------------------------------------------------+
-
-   Get/Set the log level for each daemon.
-
-* Files
-
-** <<etc/hadoop/hadoop-env.sh>>
-
-    This file stores the global settings used by all Hadoop shell commands.
-
-** <<etc/hadoop/hadoop-user-functions.sh>>
-
-    This file allows for advanced users to override some shell functionality.
-
-** <<~/.hadooprc>>
-
-    This stores the personal environment for an individual user.  It is
-    processed after the hadoop-env.sh and hadoop-user-functions.sh files
-    and can contain the same settings.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
deleted file mode 100644
index 98d1f57..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
+++ /dev/null
@@ -1,541 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-Apache Hadoop Compatibility
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Apache Hadoop Compatibility
-
-%{toc|section=1|fromDepth=0}
-
-* Purpose
-
-  This document captures the compatibility goals of the Apache Hadoop
-  project. The different types of compatibility between Hadoop
-  releases that affects Hadoop developers, downstream projects, and
-  end-users are enumerated. For each type of compatibility we:
-  
-  * describe the impact on downstream projects or end-users
- 
-  * where applicable, call out the policy adopted by the Hadoop
-   developers when incompatible changes are permitted.
-
-* Compatibility types
-
-** Java API
-
-   Hadoop interfaces and classes are annotated to describe the intended
-   audience and stability in order to maintain compatibility with previous
-   releases. See {{{./InterfaceClassification.html}Hadoop Interface
-   Classification}}
-   for details.
-
-   * InterfaceAudience: captures the intended audience, possible
-   values are Public (for end users and external projects),
-   LimitedPrivate (for other Hadoop components, and closely related
-   projects like YARN, MapReduce, HBase etc.), and Private (for intra component 
-   use).
- 
-   * InterfaceStability: describes what types of interface changes are
-   permitted. Possible values are Stable, Evolving, Unstable, and Deprecated.
-
-*** Use Cases
-
-    * Public-Stable API compatibility is required to ensure end-user programs
-     and downstream projects continue to work without modification.
-
-    * LimitedPrivate-Stable API compatibility is required to allow upgrade of
-     individual components across minor releases.
-
-    * Private-Stable API compatibility is required for rolling upgrades.
-
-*** Policy
-
-    * Public-Stable APIs must be deprecated for at least one major release
-    prior to their removal in a major release.
-
-    * LimitedPrivate-Stable APIs can change across major releases,
-    but not within a major release.
-
-    * Private-Stable APIs can change across major releases,
-    but not within a major release.
-
-    * Classes not annotated are implicitly "Private". Class members not
-    annotated inherit the annotations of the enclosing class.
-
-    * Note: APIs generated from the proto files need to be compatible for
-    rolling-upgrades. See the section on wire-compatibility for more details.
-    The compatibility policies for APIs and wire-communication need to go
-    hand-in-hand to address this.
-
-** Semantic compatibility
-
-   Apache Hadoop strives to ensure that the behavior of APIs remains
-   consistent over versions, though changes for correctness may result in
-   changes in behavior. Tests and javadocs specify the API's behavior.
-   The community is in the process of specifying some APIs more rigorously,
-   and enhancing test suites to verify compliance with the specification,
-   effectively creating a formal specification for the subset of behaviors
-   that can be easily tested.
-
-*** Policy
-
-   The behavior of API may be changed to fix incorrect behavior,
-   such a change to be accompanied by updating existing buggy tests or adding
-   tests in cases there were none prior to the change.
-
-** Wire compatibility
-
-   Wire compatibility concerns data being transmitted over the wire
-   between Hadoop processes. Hadoop uses Protocol Buffers for most RPC
-   communication. Preserving compatibility requires prohibiting
-   modification as described below.
-   Non-RPC communication should be considered as well,
-   for example using HTTP to transfer an HDFS image as part of
-   snapshotting or transferring MapTask output. The potential
-   communications can be categorized as follows:
- 
-   * Client-Server: communication between Hadoop clients and servers (e.g.,
-   the HDFS client to NameNode protocol, or the YARN client to
-   ResourceManager protocol).
-
-   * Client-Server (Admin): It is worth distinguishing a subset of the
-   Client-Server protocols used solely by administrative commands (e.g.,
-   the HAAdmin protocol) as these protocols only impact administrators
-   who can tolerate changes that end users (which use general
-   Client-Server protocols) can not.
-
-   * Server-Server: communication between servers (e.g., the protocol between
-   the DataNode and NameNode, or NodeManager and ResourceManager)
-
-*** Use Cases
-    
-    * Client-Server compatibility is required to allow users to
-    continue using the old clients even after upgrading the server
-    (cluster) to a later version (or vice versa).  For example, a
-    Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.
-
-    * Client-Server compatibility is also required to allow users to upgrade the
-    client before upgrading the server (cluster).  For example, a Hadoop 2.4.0
-    client talking to a Hadoop 2.3.0 cluster.  This allows deployment of
-    client-side bug fixes ahead of full cluster upgrades.  Note that new cluster
-    features invoked by new client APIs or shell commands will not be usable.
-    YARN applications that attempt to use new APIs (including new fields in data
-    structures) that have not yet deployed to the cluster can expect link
-    exceptions.
-
-    * Client-Server compatibility is also required to allow upgrading
-    individual components without upgrading others. For example,
-    upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
-
-    * Server-Server compatibility is required to allow mixed versions
-    within an active cluster so the cluster may be upgraded without
-    downtime in a rolling fashion.
-
-*** Policy
-
-    * Both Client-Server and Server-Server compatibility is preserved within a
-    major release. (Different policies for different categories are yet to be
-    considered.)
-
-    * Compatibility can be broken only at a major release, though breaking compatibility
-    even at major releases has grave consequences and should be discussed in the Hadoop community.
-
-    * Hadoop protocols are defined in .proto (ProtocolBuffers) files.
-    Client-Server protocols and Server-protocol .proto files are marked as stable. 
-    When a .proto file is marked as stable it means that changes should be made
-    in a compatible fashion as described below:
-
-      * The following changes are compatible and are allowed at any time:
-
-        * Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code.
-
-        * Add a new rpc/method to the service
-
-        * Add a new optional request to a Message
-
-        * Rename a field
-
-        * Rename a .proto file
-
-        * Change .proto annotations that effect code generation (e.g. name of java package)
-
-      * The following changes are incompatible but can be considered only at a major release 
-
-        * Change the rpc/method name
-
-        * Change the rpc/method parameter type or return type
-
-        * Remove an rpc/method
-
-        * Change the service name
-
-        * Change the name of a Message
-
-        * Modify a field type in an incompatible way (as defined recursively)
-
-        * Change an optional field to required
-
-        * Add or delete a required field
-
-        * Delete an optional field as long as the optional field has reasonable defaults to allow deletions
-
-      * The following changes are incompatible and hence never allowed
-
-        * Change a field id
-
-        * Reuse an old field that was previously deleted.
-
-        * Field numbers are cheap and changing and reusing is not a good idea.
-
-
-** Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
-
-  As Apache Hadoop revisions are upgraded end-users reasonably expect that 
-  their applications should continue to work without any modifications. 
-  This is fulfilled as a result of support API compatibility, Semantic 
-  compatibility and Wire compatibility.
-  
-  However, Apache Hadoop is a very complex, distributed system and services a 
-  very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a 
-  very, very wide API; in the sense that end-users may make wide-ranging 
-  assumptions such as layout of the local disk when their map/reduce tasks are 
-  executing, environment variables for their tasks etc. In such cases, it 
-  becomes very hard to fully specify, and support, absolute compatibility.
- 
-*** Use cases
-
-    * Existing MapReduce applications, including jars of existing packaged 
-      end-user applications and projects such as Apache Pig, Apache Hive, 
-      Cascading etc. should work unmodified when pointed to an upgraded Apache 
-      Hadoop cluster within a major release. 
-
-    * Existing YARN applications, including jars of existing packaged 
-      end-user applications and projects such as Apache Tez etc. should work 
-      unmodified when pointed to an upgraded Apache Hadoop cluster within a 
-      major release. 
-
-    * Existing applications which transfer data in/out of HDFS, including jars 
-      of existing packaged end-user applications and frameworks such as Apache 
-      Flume, should work unmodified when pointed to an upgraded Apache Hadoop 
-      cluster within a major release. 
-
-*** Policy
-
-    * Existing MapReduce, YARN & HDFS applications and frameworks should work 
-      unmodified within a major release i.e. Apache Hadoop ABI is supported.
-
-    * A very minor fraction of applications maybe affected by changes to disk 
-      layouts etc., the developer community will strive to minimize these 
-      changes and will not make them within a minor version. In more egregious 
-      cases, we will consider strongly reverting these breaking changes and 
-      invalidating offending releases if necessary.
-
-    * In particular for MapReduce applications, the developer community will 
-      try our best to support provide binary compatibility across major 
-      releases e.g. applications using org.apache.hadoop.mapred.
-      
-    * APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See 
-      {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html}
-      Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x}} 
-      for more details.
-
-** REST APIs
-
-  REST API compatibility corresponds to both the request (URLs) and responses
-   to each request (content, which may contain other URLs). Hadoop REST APIs
-   are specifically meant for stable use by clients across releases,
-   even major releases. The following are the exposed REST APIs:
-
-  * {{{../hadoop-hdfs/WebHDFS.html}WebHDFS}} - Stable
-
-  * {{{../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html}ResourceManager}}
-
-  * {{{../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html}NodeManager}}
-
-  * {{{../../hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html}MR Application Master}}
-
-  * {{{../../hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html}History Server}}
-  
-*** Policy
-    
-    The APIs annotated stable in the text above preserve compatibility
-    across at least one major release, and maybe deprecated by a newer 
-    version of the REST API in a major release.
-
-** Metrics/JMX
-
-   While the Metrics API compatibility is governed by Java API compatibility,
-   the actual metrics exposed by Hadoop need to be compatible for users to
-   be able to automate using them (scripts etc.). Adding additional metrics
-   is compatible. Modifying (eg changing the unit or measurement) or removing
-   existing metrics breaks compatibility. Similarly, changes to JMX MBean
-   object names also break compatibility.
-
-*** Policy 
-
-    Metrics should preserve compatibility within the major release.
-
-** File formats & Metadata
-
-   User and system level data (including metadata) is stored in files of
-   different formats. Changes to the metadata or the file formats used to
-   store data/metadata can lead to incompatibilities between versions.
-
-*** User-level file formats
-
-    Changes to formats that end-users use to store their data can prevent
-    them for accessing the data in later releases, and hence it is highly
-    important to keep those file-formats compatible. One can always add a
-    "new" format improving upon an existing format. Examples of these formats
-    include har, war, SequenceFileFormat etc.
-
-**** Policy
-
-     * Non-forward-compatible user-file format changes are
-     restricted to major releases. When user-file formats change, new
-     releases are expected to read existing formats, but may write data
-     in formats incompatible with prior releases. Also, the community
-     shall prefer to create a new format that programs must opt in to
-     instead of making incompatible changes to existing formats.
-
-*** System-internal file formats
-
-    Hadoop internal data is also stored in files and again changing these
-    formats can lead to incompatibilities. While such changes are not as
-    devastating as the user-level file formats, a policy on when the
-    compatibility can be broken is important.
-
-**** MapReduce
-
-     MapReduce uses formats like I-File to store MapReduce-specific data.
-     
-
-***** Policy
-
-     MapReduce-internal formats like IFile maintain compatibility within a
-     major release. Changes to these formats can cause in-flight jobs to fail 
-     and hence we should ensure newer clients can fetch shuffle-data from old 
-     servers in a compatible manner.
-
-**** HDFS Metadata
-
-    HDFS persists metadata (the image and edit logs) in a particular format.
-    Incompatible changes to either the format or the metadata prevent
-    subsequent releases from reading older metadata. Such incompatible
-    changes might require an HDFS "upgrade" to convert the metadata to make
-    it accessible. Some changes can require more than one such "upgrades".
-
-    Depending on the degree of incompatibility in the changes, the following
-    potential scenarios can arise:
-
-    * Automatic: The image upgrades automatically, no need for an explicit
-    "upgrade".
-
-    * Direct: The image is upgradable, but might require one explicit release
-    "upgrade".
-
-    * Indirect: The image is upgradable, but might require upgrading to
-    intermediate release(s) first.
-
-    * Not upgradeable: The image is not upgradeable.
-
-***** Policy
-
-    * A release upgrade must allow a cluster to roll-back to the older
-    version and its older disk format. The rollback needs to restore the
-    original data, but not required to restore the updated data.
-
-    * HDFS metadata changes must be upgradeable via any of the upgrade
-    paths - automatic, direct or indirect.
-
-    * More detailed policies based on the kind of upgrade are yet to be
-    considered.
-
-** Command Line Interface (CLI)
-
-   The Hadoop command line programs may be use either directly via the
-   system shell or via shell scripts. Changing the path of a command,
-   removing or renaming command line options, the order of arguments,
-   or the command return code and output break compatibility and
-   may adversely affect users.
-   
-*** Policy 
-
-    CLI commands are to be deprecated (warning when used) for one
-    major release before they are removed or incompatibly modified in
-    a subsequent major release.
-
-** Web UI
-
-   Web UI, particularly the content and layout of web pages, changes
-   could potentially interfere with attempts to screen scrape the web
-   pages for information.
-
-*** Policy
-
-    Web pages are not meant to be scraped and hence incompatible
-    changes to them are allowed at any time. Users are expected to use
-    REST APIs to get any information.
-
-** Hadoop Configuration Files
-
-   Users use (1) Hadoop-defined properties to configure and provide hints to
-   Hadoop and (2) custom properties to pass information to jobs. Hence,
-   compatibility of config properties is two-fold:
-
-   * Modifying key-names, units of values, and default values of Hadoop-defined
-     properties.
-
-   * Custom configuration property keys should not conflict with the
-     namespace of Hadoop-defined properties. Typically, users should
-     avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, net,
-     file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn.
-
-*** Policy 
-
-    * Hadoop-defined properties are to be deprecated at least for one
-      major release before being removed. Modifying units for existing
-      properties is not allowed.
-
-    * The default values of Hadoop-defined properties can
-      be changed across minor/major releases, but will remain the same
-      across point releases within a minor release.
-
-    * Currently, there is NO explicit policy regarding when new
-      prefixes can be added/removed, and the list of prefixes to be
-      avoided for custom configuration properties. However, as noted above, 
-      users should avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, 
-      net, file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn.
-           
-** Directory Structure 
-
-   Source code, artifacts (source and tests), user logs, configuration files,
-   output and job history are all stored on disk either local file system or
-   HDFS. Changing the directory structure of these user-accessible
-   files break compatibility, even in cases where the original path is
-   preserved via symbolic links (if, for example, the path is accessed
-   by a servlet that is configured to not follow symbolic links).
-
-*** Policy
-
-    * The layout of source code and build artifacts can change
-      anytime, particularly so across major versions. Within a major
-      version, the developers will attempt (no guarantees) to preserve
-      the directory structure; however, individual files can be
-      added/moved/deleted. The best way to ensure patches stay in sync
-      with the code is to get them committed to the Apache source tree.
-
-    * The directory structure of configuration files, user logs, and
-      job history will be preserved across minor and point releases
-      within a major release.
-
-** Java Classpath
-
-   User applications built against Hadoop might add all Hadoop jars
-   (including Hadoop's library dependencies) to the application's
-   classpath. Adding new dependencies or updating the version of
-   existing dependencies may interfere with those in applications'
-   classpaths.
-
-*** Policy
-
-    Currently, there is NO policy on when Hadoop's dependencies can
-    change.
-
-** Environment variables
-
-   Users and related projects often utilize the exported environment
-   variables (eg HADOOP_CONF_DIR), therefore removing or renaming
-   environment variables is an incompatible change.
-
-*** Policy
-
-    Currently, there is NO policy on when the environment variables
-    can change. Developers try to limit changes to major releases.
-
-** Build artifacts
-
-   Hadoop uses maven for project management and changing the artifacts
-   can affect existing user workflows.
-
-*** Policy
-
-   * Test artifacts: The test jars generated are strictly for internal
-     use and are not expected to be used outside of Hadoop, similar to
-     APIs annotated @Private, @Unstable.
-
-   * Built artifacts: The hadoop-client artifact (maven
-     groupId:artifactId) stays compatible within a major release,
-     while the other artifacts can change in incompatible ways.
-
-** Hardware/Software Requirements
-
-   To keep up with the latest advances in hardware, operating systems,
-   JVMs, and other software, new Hadoop releases or some of their
-   features might require higher versions of the same. For a specific
-   environment, upgrading Hadoop might require upgrading other
-   dependent software components.
-
-*** Policies
-
-    * Hardware
-
-      * Architecture: The community has no plans to restrict Hadoop to
-        specific architectures, but can have family-specific
-        optimizations.
-
-      * Minimum resources: While there are no guarantees on the
-        minimum resources required by Hadoop daemons, the community
-        attempts to not increase requirements within a minor release.
-
-    * Operating Systems: The community will attempt to maintain the
-      same OS requirements (OS kernel versions) within a minor
-      release. Currently GNU/Linux and Microsoft Windows are the OSes officially 
-      supported by the community while Apache Hadoop is known to work reasonably 
-      well on other OSes such as Apple MacOSX, Solaris etc.
-
-    * The JVM requirements will not change across point releases
-      within the same minor release except if the JVM version under
-      question becomes unsupported. Minor/major releases might require
-      later versions of JVM for some/all of the supported operating
-      systems.
-
-    * Other software: The community tries to maintain the minimum
-      versions of additional software required by Hadoop. For example,
-      ssh, kerberos etc.
-  
-* References
-  
-  Here are some relevant JIRAs and pages related to the topic:
-
-  * The evolution of this document -
-    {{{https://issues.apache.org/jira/browse/HADOOP-9517}HADOOP-9517}}
-
-  * Binary compatibility for MapReduce end-user applications between hadoop-1.x and hadoop-2.x -
-    {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html}
-    MapReduce Compatibility between hadoop-1.x and hadoop-2.x}}
-
-  * Annotations for interfaces as per interface classification
-    schedule -
-    {{{https://issues.apache.org/jira/browse/HADOOP-7391}HADOOP-7391}}
-    {{{./InterfaceClassification.html}Hadoop Interface Classification}}
-
-  * Compatibility for Hadoop 1.x releases -
-    {{{https://issues.apache.org/jira/browse/HADOOP-5071}HADOOP-5071}}
-
-  * The {{{http://wiki.apache.org/hadoop/Roadmap}Hadoop Roadmap}} page
-    that captures other release policies
-


Mime
View raw message