hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cmcc...@apache.org
Subject [4/6] hadoop git commit: HADOOP-11495. Backport "convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe)
Date Wed, 25 Feb 2015 00:03:51 GMT
http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm
deleted file mode 100644
index 0235219..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm
+++ /dev/null
@@ -1,689 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop in Secure Mode
-  ---
-  ---
-  ${maven.build.timestamp}
-
-%{toc|section=0|fromDepth=0|toDepth=3}
-
-Hadoop in Secure Mode
-
-* Introduction
-
-  This document describes how to configure authentication for Hadoop in
-  secure mode.
-
-  By default Hadoop runs in non-secure mode in which no actual
-  authentication is required.
-  By configuring Hadoop runs in secure mode,
-  each user and service needs to be authenticated by Kerberos
-  in order to use Hadoop services.
-
-  Security features of Hadoop consist of
-  {{{Authentication}authentication}},
-  {{{./ServiceLevelAuth.html}service level authorization}},
-  {{{./HttpAuthentication.html}authentication for Web consoles}}
-  and {{{Data confidentiality}data confidenciality}}.
-
-
-* Authentication
-
-** End User Accounts
-
-  When service level authentication is turned on,
-  end users using Hadoop in secure mode needs to be authenticated by Kerberos.
-  The simplest way to do authentication is using <<<kinit>>> command of Kerberos.
-
-** User Accounts for Hadoop Daemons
-
-  Ensure that HDFS and YARN daemons run as different Unix users,
-  e.g. <<<hdfs>>> and <<<yarn>>>.
-  Also, ensure that the MapReduce JobHistory server runs as
-  different user such as <<<mapred>>>.
-
-  It's recommended to have them share a Unix group, for e.g. <<<hadoop>>>.
-  See also "{{Mapping from user to group}}" for group management.
-
-*---------------+----------------------------------------------------------------------+
-|| User:Group   || Daemons                                                             |
-*---------------+----------------------------------------------------------------------+
-| hdfs:hadoop   | NameNode, Secondary NameNode, JournalNode, DataNode                  |
-*---------------+----------------------------------------------------------------------+
-| yarn:hadoop   | ResourceManager, NodeManager                                         |
-*---------------+----------------------------------------------------------------------+
-| mapred:hadoop | MapReduce JobHistory Server                                          |
-*---------------+----------------------------------------------------------------------+
-
-** Kerberos principals for Hadoop Daemons and Users
-
-  For running hadoop service daemons in  Hadoop in secure mode,
-  Kerberos principals are required.
-  Each service reads auhenticate information saved in keytab file with appropriate permission.
-
-  HTTP web-consoles should be served by principal different from RPC's one.
-
-  Subsections below shows the examples of credentials for Hadoop services.
-
-*** HDFS
-
-    The NameNode keytab file, on the NameNode host, should look like the
-    following:
-
-----
-$ klist -e -k -t /etc/security/keytab/nn.service.keytab
-Keytab name: FILE:/etc/security/keytab/nn.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    The Secondary NameNode keytab file, on that host, should look like the
-    following:
-
-----
-$ klist -e -k -t /etc/security/keytab/sn.service.keytab
-Keytab name: FILE:/etc/security/keytab/sn.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    The DataNode keytab file, on each host, should look like the following:
-
-----
-$ klist -e -k -t /etc/security/keytab/dn.service.keytab
-Keytab name: FILE:/etc/security/keytab/dn.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-*** YARN
-
-    The ResourceManager keytab file, on the ResourceManager host, should look
-    like the following:
-
-----
-$ klist -e -k -t /etc/security/keytab/rm.service.keytab
-Keytab name: FILE:/etc/security/keytab/rm.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    The NodeManager keytab file, on each host, should look like the following:
-
-----
-$ klist -e -k -t /etc/security/keytab/nm.service.keytab
-Keytab name: FILE:/etc/security/keytab/nm.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-*** MapReduce JobHistory Server
-
-    The MapReduce JobHistory Server keytab file, on that host, should look
-    like the following:
-
-----
-$ klist -e -k -t /etc/security/keytab/jhs.service.keytab
-Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-** Mapping from Kerberos principal to OS user account
-
-  Hadoop maps Kerberos principal to OS user account using
-  the rule specified by <<<hadoop.security.auth_to_local>>>
-  which works in the same way as the <<<auth_to_local>>> in
-  {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos configuration file (krb5.conf)}}.
-  In addition, Hadoop <<<auth_to_local>>> mapping supports the <</L>> flag that
-  lowercases the returned name.
-
-  By default, it picks the first component of principal name as a user name
-  if the realms matches to the <<<default_realm>>> (usually defined in /etc/krb5.conf).
-  For example, <<<host/full.qualified.domain.name@REALM.TLD>>> is mapped to <<<host>>>
-  by default rule.
-
-** Mapping from user to group
-
-  Though files on HDFS are associated to owner and group,
-  Hadoop does not have the definition of group by itself.
-  Mapping from user to group is done by OS or LDAP.
-
-  You can change a way of mapping by
-  specifying the name of mapping provider as a value of
-  <<<hadoop.security.group.mapping>>>
-  See {{{../hadoop-hdfs/HdfsPermissionsGuide.html}HDFS Permissions Guide}} for details.
-
-  Practically you need to manage SSO environment using Kerberos with LDAP
-  for Hadoop in secure mode.
-
-** Proxy user
-
-  Some products such as Apache Oozie which access the services of Hadoop
-  on behalf of end users need to be able to impersonate end users.
-  See {{{./Superusers.html}the doc of proxy user}} for details.
-
-** Secure DataNode
-
-  Because the data transfer protocol of DataNode
-  does not use the RPC framework of Hadoop,
-  DataNode must authenticate itself by
-  using privileged ports which are specified by
-  <<<dfs.datanode.address>>> and <<<dfs.datanode.http.address>>>.
-  This authentication is based on the assumption
-  that the attacker won't be able to get root privileges.
-
-  When you execute <<<hdfs datanode>>> command as root,
-  server process binds privileged port at first,
-  then drops privilege and runs as the user account specified by
-  <<<HADOOP_SECURE_DN_USER>>>.
-  This startup process uses jsvc installed to <<<JSVC_HOME>>>.
-  You must specify <<<HADOOP_SECURE_DN_USER>>> and <<<JSVC_HOME>>>
-  as environment variables on start up (in hadoop-env.sh).
-
-  As of version 2.6.0, SASL can be used to authenticate the data transfer
-  protocol.  In this configuration, it is no longer required for secured clusters
-  to start the DataNode as root using jsvc and bind to privileged ports.  To
-  enable SASL on data transfer protocol, set <<<dfs.data.transfer.protection>>>
-  in hdfs-site.xml, set a non-privileged port for <<<dfs.datanode.address>>>, set
-  <<<dfs.http.policy>>> to <HTTPS_ONLY> and make sure the
-  <<<HADOOP_SECURE_DN_USER>>> environment variable is not defined.  Note that it
-  is not possible to use SASL on data transfer protocol if
-  <<<dfs.datanode.address>>> is set to a privileged port.  This is required for
-  backwards-compatibility reasons.
-
-  In order to migrate an existing cluster that used root authentication to start
-  using SASL instead, first ensure that version 2.6.0 or later has been deployed
-  to all cluster nodes as well as any external applications that need to connect
-  to the cluster.  Only versions 2.6.0 and later of the HDFS client can connect
-  to a DataNode that uses SASL for authentication of data transfer protocol, so
-  it is vital that all callers have the correct version before migrating.  After
-  version 2.6.0 or later has been deployed everywhere, update configuration of
-  any external applications to enable SASL.  If an HDFS client is enabled for
-  SASL, then it can connect successfully to a DataNode running with either root
-  authentication or SASL authentication.  Changing configuration for all clients
-  guarantees that subsequent configuration changes on DataNodes will not disrupt
-  the applications.  Finally, each individual DataNode can be migrated by
-  changing its configuration and restarting.  It is acceptable to have a mix of
-  some DataNodes running with root authentication and some DataNodes running with
-  SASL authentication temporarily during this migration period, because an HDFS
-  client enabled for SASL can connect to both.
-
-* Data confidentiality
-
-** Data Encryption on RPC
-
-  The data transfered between hadoop services and clients.
-  Setting <<<hadoop.rpc.protection>>> to <<<"privacy">>> in the core-site.xml
-  activate data encryption.
-
-** Data Encryption on Block data transfer.
-
-  You need to set <<<dfs.encrypt.data.transfer>>> to <<<"true">>> in the hdfs-site.xml
-  in order to activate data encryption for data transfer protocol of DataNode.
-
-  Optionally, you may set <<<dfs.encrypt.data.transfer.algorithm>>> to either
-  "3des" or "rc4" to choose the specific encryption algorithm.  If unspecified,
-  then the configured JCE default on the system is used, which is usually 3DES.
-
-  Setting <<<dfs.encrypt.data.transfer.cipher.suites>>> to
-  <<<AES/CTR/NoPadding>>> activates AES encryption.  By default, this is
-  unspecified, so AES is not used.  When AES is used, the algorithm specified in
-  <<<dfs.encrypt.data.transfer.algorithm>>> is still used during an initial key
-  exchange.  The AES key bit length can be configured by setting
-  <<<dfs.encrypt.data.transfer.cipher.key.bitlength>>> to 128, 192 or 256.  The
-  default is 128.
-
-  AES offers the greatest cryptographic strength and the best performance.  At
-  this time, 3DES and RC4 have been used more often in Hadoop clusters.
-
-** Data Encryption on HTTP
-
-  Data transfer between Web-console and clients are protected by using SSL(HTTPS).
-
-
-* Configuration
-
-** Permissions for both HDFS and local fileSystem paths
-
-  The following table lists various paths on HDFS and local filesystems (on
-  all nodes) and recommended permissions:
-
-*-------------------+-------------------+------------------+------------------+
-|| Filesystem       || Path             || User:Group      || Permissions     |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<dfs.namenode.name.dir>>> | hdfs:hadoop | drwx------ |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<dfs.datanode.data.dir>>> | hdfs:hadoop | drwx------ |
-*-------------------+-------------------+------------------+------------------+
-| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | container-executor | root:hadoop | --Sr-s--- |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | / | hdfs:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | /user | hdfs:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | <<<yarn.nodemanager.remote-app-log-dir>>> | yarn:hadoop | drwxrwxrwxt |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | <<<mapreduce.jobhistory.intermediate-done-dir>>> | mapred:hadoop | |
-| | | | drwxrwxrwxt |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | <<<mapreduce.jobhistory.done-dir>>> | mapred:hadoop | |
-| | | | drwxr-x--- |
-*-------------------+-------------------+------------------+------------------+
-
-** Common Configurations
-
-  In order to turn on RPC authentication in hadoop,
-  set the value of <<<hadoop.security.authentication>>> property to
-  <<<"kerberos">>>, and set security related settings listed below appropriately.
-
-  The following properties should be in the <<<core-site.xml>>> of all the
-  nodes in the cluster.
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.security.authentication>>> | <kerberos> | |
-| | | <<<simple>>> : No authentication. (default) \
-| | | <<<kerberos>>> : Enable authentication by Kerberos. |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.security.authorization>>> | <true> | |
-| | | Enable {{{./ServiceLevelAuth.html}RPC service-level authorization}}. |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.rpc.protection>>> | <authentication> |
-| | | <authentication> : authentication only (default) \
-| | | <integrity> : integrity check in addition to authentication \
-| | | <privacy> : data encryption in addition to integrity |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.security.auth_to_local>>> | | |
-| |  <<<RULE:>>><exp1>\
-| |  <<<RULE:>>><exp2>\
-| |  <...>\
-| |  DEFAULT |
-| | | The value is string containing new line characters.
-| | | See
-| | | {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos documentation}}
-| | | for format for <exp>.
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.proxyuser.>>><superuser><<<.hosts>>> | | |
-| | | comma separated hosts from which <superuser> access are allowd to impersonation. |
-| | | <<<*>>> means wildcard. |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.proxyuser.>>><superuser><<<.groups>>> | | |
-| | | comma separated groups to which users impersonated by <superuser> belongs. |
-| | | <<<*>>> means wildcard. |
-*-------------------------+-------------------------+------------------------+
-Configuration for <<<conf/core-site.xml>>>
-
-**  NameNode
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.block.access.token.enable>>> | <true> |  |
-| | | Enable HDFS block access tokens for secure operations. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.https.enable>>> | <true> | |
-| | | This value is deprecated. Use dfs.http.policy |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.http.policy>>> | <HTTP_ONLY> or <HTTPS_ONLY> or <HTTP_AND_HTTPS> | |
-| | | HTTPS_ONLY turns off http access. This option takes precedence over |
-| | | the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. |
-| | | If using SASL to authenticate data transfer protocol instead of |
-| | | running DataNode as root and using privileged ports, then this property |
-| | | must be set to <HTTPS_ONLY> to guarantee authentication of HTTP servers. |
-| | | (See <<<dfs.data.transfer.protection>>>.)  |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.https-address>>> | <nn_host_fqdn:50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.https.port>>> | <50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.keytab.file>>> | </etc/security/keytab/nn.service.keytab> | |
-| | | Kerberos keytab file for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.kerberos.principal>>> | nn/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.kerberos.internal.spnego.principal>>> | HTTP/_HOST@REALM.TLD | |
-| | | HTTP Kerberos principal name for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-Configuration for <<<conf/hdfs-site.xml>>>
-
-**  Secondary NameNode
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.http-address>>> | <c_nn_host_fqdn:50090> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.https-port>>> | <50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.secondary.namenode.keytab.file>>> | | |
-| | </etc/security/keytab/sn.service.keytab> | |
-| | | Kerberos keytab file for the Secondary NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.secondary.namenode.kerberos.principal>>> | sn/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the Secondary NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.secondary.namenode.kerberos.internal.spnego.principal>>> | | |
-| | HTTP/_HOST@REALM.TLD | |
-| | | HTTP Kerberos principal name for the Secondary NameNode. |
-*-------------------------+-------------------------+------------------------+
-Configuration for <<<conf/hdfs-site.xml>>>
-
-** DataNode
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.data.dir.perm>>> | 700 | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.address>>> | <0.0.0.0:1004> | |
-| | | Secure DataNode must use privileged port |
-| | | in order to assure that the server was started securely. |
-| | | This means that the server must be started via jsvc. |
-| | | Alternatively, this must be set to a non-privileged port if using SASL |
-| | | to authenticate data transfer protocol. |
-| | | (See <<<dfs.data.transfer.protection>>>.)  |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.http.address>>> | <0.0.0.0:1006> | |
-| | | Secure DataNode must use privileged port |
-| | | in order to assure that the server was started securely. |
-| | | This means that the server must be started via jsvc. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.https.address>>> | <0.0.0.0:50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.keytab.file>>> | </etc/security/keytab/dn.service.keytab> | |
-| | | Kerberos keytab file for the DataNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.kerberos.principal>>> | dn/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the DataNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.encrypt.data.transfer>>> | <false> | |
-| | | set to <<<true>>> when using data encryption |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.encrypt.data.transfer.algorithm>>> | | |
-| | | optionally set to <<<3des>>> or <<<rc4>>> when using data encryption to |
-| | | control encryption algorithm |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.encrypt.data.transfer.cipher.suites>>> | | |
-| | | optionally set to <<<AES/CTR/NoPadding>>> to activate AES encryption    |
-| | | when using data encryption |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.encrypt.data.transfer.cipher.key.bitlength>>> | | |
-| | | optionally set to <<<128>>>, <<<192>>> or <<<256>>> to control key bit  |
-| | | length when using AES with data encryption |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.data.transfer.protection>>> | | |
-| | | <authentication> : authentication only \
-| | | <integrity> : integrity check in addition to authentication \
-| | | <privacy> : data encryption in addition to integrity |
-| | | This property is unspecified by default.  Setting this property enables |
-| | | SASL for authentication of data transfer protocol.  If this is enabled, |
-| | | then <<<dfs.datanode.address>>> must use a non-privileged port, |
-| | | <<<dfs.http.policy>>> must be set to <HTTPS_ONLY> and the |
-| | | <<<HADOOP_SECURE_DN_USER>>> environment variable must be undefined when |
-| | | starting the DataNode process. |
-*-------------------------+-------------------------+------------------------+
-Configuration for <<<conf/hdfs-site.xml>>>
-
-
-** WebHDFS
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.web.authentication.kerberos.principal>>> | http/_HOST@REALM.TLD | |
-| | | Kerberos keytab file for the WebHDFS. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.web.authentication.kerberos.keytab>>> | </etc/security/keytab/http.service.keytab> | |
-| | | Kerberos principal name for WebHDFS. |
-*-------------------------+-------------------------+------------------------+
-Configuration for <<<conf/hdfs-site.xml>>>
-
-
-** ResourceManager
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.keytab>>> | | |
-| | </etc/security/keytab/rm.service.keytab> | |
-| | | Kerberos keytab file for the ResourceManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.principal>>> | rm/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the ResourceManager. |
-*-------------------------+-------------------------+------------------------+
-Configuration for  <<<conf/yarn-site.xml>>>
-
-**  NodeManager
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.keytab>>> | </etc/security/keytab/nm.service.keytab> | |
-| | | Kerberos keytab file for the NodeManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.principal>>> | nm/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the NodeManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.container-executor.class>>> | | |
-| | <<<org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor>>> |
-| | | Use LinuxContainerExecutor. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
-| | | Unix group of the NodeManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.linux-container-executor.path>>> | </path/to/bin/container-executor> | |
-| | | The path to the executable of Linux container executor. |
-*-------------------------+-------------------------+------------------------+
-Configuration for  <<<conf/yarn-site.xml>>>
-
-** Configuration for WebAppProxy
-
-    The <<<WebAppProxy>>> provides a proxy between the web applications
-    exported by an application and an end user.  If security is enabled
-    it will warn users before accessing a potentially unsafe web application.
-    Authentication and authorization using the proxy is handled just like
-    any other privileged web application.
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.web-proxy.address>>> | | |
-| | <<<WebAppProxy>>> host:port for proxy to AM web apps. | |
-| | | <host:port> if this is the same as <<<yarn.resourcemanager.webapp.address>>>|
-| | | or it is not defined then the <<<ResourceManager>>> will run the proxy|
-| | | otherwise a standalone proxy server will need to be launched.|
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.web-proxy.keytab>>> | | |
-| | </etc/security/keytab/web-app.service.keytab> | |
-| | | Kerberos keytab file for the WebAppProxy. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.web-proxy.principal>>> | wap/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the WebAppProxy. |
-*-------------------------+-------------------------+------------------------+
-Configuration for  <<<conf/yarn-site.xml>>>
-
-** LinuxContainerExecutor
-
-    A <<<ContainerExecutor>>> used by YARN framework which define how any
-    <container> launched and controlled.
-
-    The following are the available in Hadoop YARN:
-
-*--------------------------------------+--------------------------------------+
-|| ContainerExecutor                   || Description                         |
-*--------------------------------------+--------------------------------------+
-| <<<DefaultContainerExecutor>>>             | |
-| | The default executor which YARN uses to manage container execution. |
-| | The container process has the same Unix user as the NodeManager.  |
-*--------------------------------------+--------------------------------------+
-| <<<LinuxContainerExecutor>>>               | |
-| | Supported only on GNU/Linux, this executor runs the containers as either the |
-| | YARN user who submitted the application (when full security is enabled) or |
-| | as a dedicated user (defaults to nobody) when full security is not enabled. |
-| | When full security is enabled, this executor requires all user accounts to be |
-| | created on the cluster nodes where the containers are launched. It uses |
-| | a <setuid> executable that is included in the Hadoop distribution. |
-| | The NodeManager uses this executable to launch and kill containers. |
-| | The setuid executable switches to the user who has submitted the |
-| | application and launches or kills the containers. For maximum security, |
-| | this executor sets up restricted permissions and user/group ownership of |
-| | local files and directories used by the containers such as the shared |
-| | objects, jars, intermediate files, log files etc. Particularly note that, |
-| | because of this, except the application owner and NodeManager, no other |
-| | user can access any of the local files/directories including those |
-| | localized as part of the distributed cache. |
-*--------------------------------------+--------------------------------------+
-
-    To build the LinuxContainerExecutor executable run:
-
-----
- $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/
-----
-
-    The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the
-    path on the cluster nodes where a configuration file for the setuid
-    executable should be located. The executable should be installed in
-    $HADOOP_YARN_HOME/bin.
-
-    The executable must have specific permissions: 6050 or --Sr-s---
-    permissions user-owned by <root> (super-user) and group-owned by a
-    special group (e.g. <<<hadoop>>>) of which the NodeManager Unix user is
-    the group member and no ordinary application user is. If any application
-    user belongs to this special group, security will be compromised. This
-    special group name should be specified for the configuration property
-    <<<yarn.nodemanager.linux-container-executor.group>>> in both
-    <<<conf/yarn-site.xml>>> and <<<conf/container-executor.cfg>>>.
-
-    For example, let's say that the NodeManager is run as user <yarn> who is
-    part of the groups users and <hadoop>, any of them being the primary group.
-    Let also be that <users> has both <yarn> and another user
-    (application submitter) <alice> as its members, and <alice> does not
-    belong to <hadoop>. Going by the above description, the setuid/setgid
-    executable should be set 6050 or --Sr-s--- with user-owner as <yarn> and
-    group-owner as <hadoop> which has <yarn> as its member (and not <users>
-    which has <alice> also as its member besides <yarn>).
-
-    The LinuxTaskController requires that paths including and leading up to
-    the directories specified in <<<yarn.nodemanager.local-dirs>>> and
-    <<<yarn.nodemanager.log-dirs>>> to be set 755 permissions as described
-    above in the table on permissions on directories.
-
-      * <<<conf/container-executor.cfg>>>
-
-    The executable requires a configuration file called
-    <<<container-executor.cfg>>> to be present in the configuration
-    directory passed to the mvn target mentioned above.
-
-    The configuration file must be owned by the user running NodeManager
-    (user <<<yarn>>> in the above example), group-owned by anyone and
-    should have the permissions 0400 or r--------.
-
-    The executable requires following configuration items to be present
-    in the <<<conf/container-executor.cfg>>> file. The items should be
-    mentioned as simple key=value pairs, one per-line:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
-| | | Unix group of the NodeManager. The group owner of the |
-| | |<container-executor> binary should be this group. Should be same as the |
-| | | value with which the NodeManager is configured. This configuration is |
-| | | required for validating the secure access of the <container-executor> |
-| | | binary. |
-*-------------------------+-------------------------+------------------------+
-| <<<banned.users>>> | hdfs,yarn,mapred,bin | Banned users. |
-*-------------------------+-------------------------+------------------------+
-| <<<allowed.system.users>>> | foo,bar | Allowed system users. |
-*-------------------------+-------------------------+------------------------+
-| <<<min.user.id>>> | 1000 | Prevent other super-users. |
-*-------------------------+-------------------------+------------------------+
-Configuration for  <<<conf/yarn-site.xml>>>
-
-      To re-cap, here are the local file-sysytem permissions required for the
-      various paths related to the <<<LinuxContainerExecutor>>>:
-
-*-------------------+-------------------+------------------+------------------+
-|| Filesystem       || Path             || User:Group      || Permissions     |
-*-------------------+-------------------+------------------+------------------+
-| local | container-executor | root:hadoop | --Sr-s--- |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-
-**  MapReduce JobHistory Server
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.address>>> | | |
-| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.keytab>>> | |
-| | </etc/security/keytab/jhs.service.keytab> | |
-| | | Kerberos keytab file for the MapReduce JobHistory Server. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.principal>>> | jhs/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the MapReduce JobHistory Server. |
-*-------------------------+-------------------------+------------------------+
-Configuration for  <<<conf/mapred-site.xml>>>

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
deleted file mode 100644
index 86fb3d6..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
+++ /dev/null
@@ -1,216 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Service Level Authorization Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Service Level Authorization Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Purpose
-
-   This document describes how to configure and manage Service Level
-   Authorization for Hadoop.
-
-* Prerequisites
-
-   Make sure Hadoop is installed, configured and setup correctly. For more
-   information see:
-
-     * {{{./SingleCluster.html}Single Node Setup}} for first-time users.
-
-     * {{{./ClusterSetup.html}Cluster Setup}} for large, distributed clusters.
-
-* Overview
-
-   Service Level Authorization is the initial authorization mechanism to
-   ensure clients connecting to a particular Hadoop service have the
-   necessary, pre-configured, permissions and are authorized to access the
-   given service. For example, a MapReduce cluster can use this mechanism
-   to allow a configured list of users/groups to submit jobs.
-
-   The <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> configuration file is used to
-   define the access control lists for various Hadoop services.
-
-   Service Level Authorization is performed much before to other access
-   control checks such as file-permission checks, access control on job
-   queues etc.
-
-* Configuration
-
-   This section describes how to configure service-level authorization via
-   the configuration file <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>>.
-
-** Enable Service Level Authorization
-
-   By default, service-level authorization is disabled for Hadoop. To
-   enable it set the configuration property hadoop.security.authorization
-   to true in <<<${HADOOP_CONF_DIR}/core-site.xml>>>.
-
-** Hadoop Services and Configuration Properties
-
-   This section lists the various Hadoop services and their configuration
-   knobs:
-
-*-------------------------------------+--------------------------------------+
-|| Property                           || Service
-*-------------------------------------+--------------------------------------+
-security.client.protocol.acl          | ACL for ClientProtocol, which is used by user code via the DistributedFileSystem.
-*-------------------------------------+--------------------------------------+
-security.client.datanode.protocol.acl | ACL for ClientDatanodeProtocol, the client-to-datanode protocol for block recovery.
-*-------------------------------------+--------------------------------------+
-security.datanode.protocol.acl        | ACL for DatanodeProtocol, which is used by datanodes to communicate with the namenode.
-*-------------------------------------+--------------------------------------+
-security.inter.datanode.protocol.acl  | ACL for InterDatanodeProtocol, the inter-datanode protocol for updating generation timestamp.
-*-------------------------------------+--------------------------------------+
-security.namenode.protocol.acl        | ACL for NamenodeProtocol, the protocol used by the secondary namenode to communicate with the namenode.
-*-------------------------------------+--------------------------------------+
-security.inter.tracker.protocol.acl   | ACL for InterTrackerProtocol, used by the tasktrackers to communicate with the jobtracker.
-*-------------------------------------+--------------------------------------+
-security.job.submission.protocol.acl  | ACL for JobSubmissionProtocol, used by job clients to communciate with the jobtracker for job submission, querying job status etc.
-*-------------------------------------+--------------------------------------+
-security.task.umbilical.protocol.acl  | ACL for TaskUmbilicalProtocol, used by the map and reduce tasks to communicate with the parent tasktracker.
-*-------------------------------------+--------------------------------------+
-security.refresh.policy.protocol.acl  | ACL for RefreshAuthorizationPolicyProtocol, used by the dfsadmin and mradmin commands to refresh the security policy in-effect.
-*-------------------------------------+--------------------------------------+
-security.ha.service.protocol.acl      | ACL for HAService protocol used by HAAdmin to manage the active and stand-by states of namenode.
-*-------------------------------------+--------------------------------------+
-
-** Access Control Lists
-
-   <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> defines an access control list for
-   each Hadoop service. Every access control list has a simple format:
-
-   The list of users and groups are both comma separated list of names.
-   The two lists are separated by a space.
-
-   Example: <<<user1,user2 group1,group2>>>.
-
-   Add a blank at the beginning of the line if only a list of groups is to
-   be provided, equivalently a comma-separated list of users followed by
-   a space or nothing implies only a set of given users.
-
-   A special value of <<<*>>> implies that all users are allowed to access the
-   service. 
-   
-   If access control list is not defined for a service, the value of
-   <<<security.service.authorization.default.acl>>> is applied. If 
-   <<<security.service.authorization.default.acl>>> is not defined, <<<*>>>  is applied.
-
- ** Blocked Access Control Lists
-
-   In some cases, it is required to specify blocked access control list for a service. This specifies
-   the list of users and groups who are not authorized to access the service. The format of
-   the blocked access control list is same as that of access control list. The blocked access
-   control list can be specified via <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>>. The property name
-   is derived by suffixing with ".blocked".
-
-   Example: The property name of blocked access control list for <<<security.client.protocol.acl>>
-   will be <<<security.client.protocol.acl.blocked>>>
-
-   For a service, it is possible to specify both an access control list and a blocked control
-   list. A user is authorized to access the service if the user is in the access control and not in
-   the blocked access control list.
-
-   If blocked access control list is not defined for a service, the value of
-   <<<security.service.authorization.default.acl.blocked>>> is applied. If
-   <<<security.service.authorization.default.acl.blocked>>> is not defined,
-   empty blocked access control list is applied.
-
-
-** Refreshing Service Level Authorization Configuration
-
-   The service-level authorization configuration for the NameNode and
-   JobTracker can be changed without restarting either of the Hadoop
-   master daemons. The cluster administrator can change
-   <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> on the master nodes and instruct
-   the NameNode and JobTracker to reload their respective configurations
-   via the <<<-refreshServiceAcl>>> switch to <<<dfsadmin>>> and <<<mradmin>>> commands
-   respectively.
-
-   Refresh the service-level authorization configuration for the NameNode:
-
-----
-   $ bin/hadoop dfsadmin -refreshServiceAcl
-----
-
-   Refresh the service-level authorization configuration for the
-   JobTracker:
-
-----
-   $ bin/hadoop mradmin -refreshServiceAcl
-----
-
-   Of course, one can use the <<<security.refresh.policy.protocol.acl>>>
-   property in <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> to restrict access to
-   the ability to refresh the service-level authorization configuration to
-   certain users/groups.
-
-  ** Access Control using list of ip addresses, host names and ip ranges
-
-   Access to a service can be controlled based on the ip address of the client accessing
-   the service. It is possible to restrict access to a service from a set of machines by
-   specifying a list of ip addresses, host names and ip ranges. The property name for each service
-   is derived from the corresponding acl's property name. If the property name of acl is
-   security.client.protocol.acl, property name for the hosts list will be
-   security.client.protocol.hosts.
-
-   If hosts list is not defined for a service, the value of
-   <<<security.service.authorization.default.hosts>>> is applied. If
-   <<<security.service.authorization.default.hosts>>> is not defined, <<<*>>>  is applied.
-
-   It is possible to specify a blocked list of hosts. Only those machines which are in the
-   hosts list, but not in the blocked hosts list will be granted access to the service. The property
-   name is derived by suffixing with ".blocked".
-
-   Example: The property name of blocked hosts list for <<<security.client.protocol.hosts>>
-   will be <<<security.client.protocol.hosts.blocked>>>
-
-   If blocked hosts list is not defined for a service, the value of
-   <<<security.service.authorization.default.hosts.blocked>>> is applied. If
-   <<<security.service.authorization.default.hosts.blocked>>> is not defined,
-   empty blocked hosts list is applied.
-
-** Examples
-
-   Allow only users <<<alice>>>, <<<bob>>> and users in the <<<mapreduce>>> group to submit
-   jobs to the MapReduce cluster:
-
-----
-<property>
-     <name>security.job.submission.protocol.acl</name>
-     <value>alice,bob mapreduce</value>
-</property>
-----
-
-   Allow only DataNodes running as the users who belong to the group
-   datanodes to communicate with the NameNode:
-
-----
-<property>
-     <name>security.datanode.protocol.acl</name>
-     <value>datanodes</value>
-</property>
-----
-
-   Allow any user to talk to the HDFS cluster as a DFSClient:
-
-----
-<property>
-     <name>security.client.protocol.acl</name>
-     <value>*</value>
-</property>
-----

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm
deleted file mode 100644
index ef7532a..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm
+++ /dev/null
@@ -1,286 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
-
-%{toc|section=1|fromDepth=0}
-
-* Purpose
-
-  This document describes how to set up and configure a single-node Hadoop
-  installation so that you can quickly perform simple operations using Hadoop
-  MapReduce and the Hadoop Distributed File System (HDFS).
-
-* Prerequisites
-
-** Supported Platforms
-
-   * GNU/Linux is supported as a development and production platform.
-     Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
-
-   * Windows is also a supported platform but the followings steps
-     are for Linux only. To set up Hadoop on Windows, see
-     {{{http://wiki.apache.org/hadoop/Hadoop2OnWindows}wiki page}}.
-
-** Required Software
-
-   Required software for Linux include:
-
-   [[1]] Java\u2122 must be installed. Recommended Java versions are described
-         at {{{http://wiki.apache.org/hadoop/HadoopJavaVersions}
-         HadoopJavaVersions}}.
-
-   [[2]] ssh must be installed and sshd must be running to use the Hadoop
-         scripts that manage remote Hadoop daemons.
-
-** Installing Software
-
-  If your cluster doesn't have the requisite software you will need to install
-  it.
-
-  For example on Ubuntu Linux:
-
-----
-  $ sudo apt-get install ssh
-  $ sudo apt-get install rsync
-----
-
-* Download
-
-  To get a Hadoop distribution, download a recent stable release from one of
-  the {{{http://www.apache.org/dyn/closer.cgi/hadoop/common/}
-  Apache Download Mirrors}}.
-
-* Prepare to Start the Hadoop Cluster
-
-  Unpack the downloaded Hadoop distribution. In the distribution, edit
-  the file <<<etc/hadoop/hadoop-env.sh>>> to define some parameters as
-  follows:
-
-----
-  # set to the root of your Java installation
-  export JAVA_HOME=/usr/java/latest
-
-  # Assuming your installation directory is /usr/local/hadoop
-  export HADOOP_PREFIX=/usr/local/hadoop
-----
-
-  Try the following command:
-
-----
-  $ bin/hadoop
-----
-
-  This will display the usage documentation for the hadoop script.
-
-  Now you are ready to start your Hadoop cluster in one of the three supported
-  modes:
-
-   * {{{Standalone Operation}Local (Standalone) Mode}}
-
-   * {{{Pseudo-Distributed Operation}Pseudo-Distributed Mode}}
-
-   * {{{Fully-Distributed Operation}Fully-Distributed Mode}}
-
-* Standalone Operation
-
-  By default, Hadoop is configured to run in a non-distributed mode, as a
-  single Java process. This is useful for debugging.
-
-  The following example copies the unpacked conf directory to use as input
-  and then finds and displays every match of the given regular expression.
-  Output is written to the given output directory.
-
-----
-  $ mkdir input
-  $ cp etc/hadoop/*.xml input
-  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-${project.version}.jar grep input output 'dfs[a-z.]+'
-  $ cat output/*
-----
-
-* Pseudo-Distributed Operation
-
-  Hadoop can also be run on a single-node in a pseudo-distributed mode where
-  each Hadoop daemon runs in a separate Java process.
-
-** Configuration
-
-  Use the following:
-
-  etc/hadoop/core-site.xml:
-
-+---+
-<configuration>
-    <property>
-        <name>fs.defaultFS</name>
-        <value>hdfs://localhost:9000</value>
-    </property>
-</configuration>
-+---+
-
-  etc/hadoop/hdfs-site.xml:
-  
-+---+
-<configuration>
-    <property>
-        <name>dfs.replication</name>
-        <value>1</value>
-    </property>
-</configuration>
-+---+
-
-** Setup passphraseless ssh
-
-  Now check that you can ssh to the localhost without a passphrase:
-
-----
-  $ ssh localhost
-----
-
-  If you cannot ssh to localhost without a passphrase, execute the
-  following commands:
-
-----
-  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
-  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
-----
-
-** Execution
-
-  The following instructions are to run a MapReduce job locally.
-  If you want to execute a job on YARN, see {{YARN on Single Node}}.
-
-  [[1]] Format the filesystem:
-
-----
-  $ bin/hdfs namenode -format
-----
-
-  [[2]] Start NameNode daemon and DataNode daemon:
-
-----
-  $ sbin/start-dfs.sh
-----
-
-        The hadoop daemon log output is written to the <<<${HADOOP_LOG_DIR}>>>
-        directory (defaults to <<<${HADOOP_HOME}/logs>>>).
-
-  [[3]] Browse the web interface for the NameNode; by default it is
-        available at:
-
-        * NameNode - <<<http://localhost:50070/>>>
-
-  [[4]] Make the HDFS directories required to execute MapReduce jobs:
-
-----
-  $ bin/hdfs dfs -mkdir /user
-  $ bin/hdfs dfs -mkdir /user/<username>
-----
-
-  [[5]] Copy the input files into the distributed filesystem:
-
-----
-  $ bin/hdfs dfs -put etc/hadoop input
-----
-
-  [[6]] Run some of the examples provided:
-
-----
-  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-${project.version}.jar grep input output 'dfs[a-z.]+'
-----
-
-  [[7]] Examine the output files:
-
-        Copy the output files from the distributed filesystem to the local
-        filesystem and examine them:
-
-----
-  $ bin/hdfs dfs -get output output
-  $ cat output/*
-----
-
-        or
-
-        View the output files on the distributed filesystem:
-
-----
-  $ bin/hdfs dfs -cat output/*
-----
-
-  [[8]] When you're done, stop the daemons with:
-
-----
-  $ sbin/stop-dfs.sh
-----
-
-** YARN on Single Node
-
-  You can run a MapReduce job on YARN in a pseudo-distributed mode by setting
-  a few parameters and running ResourceManager daemon and NodeManager daemon
-  in addition.
-
-  The following instructions assume that 1. ~ 4. steps of
-  {{{Execution}the above instructions}} are already executed.
-
-  [[1]] Configure parameters as follows:
-
-        etc/hadoop/mapred-site.xml:
-
-+---+
-<configuration>
-    <property>
-        <name>mapreduce.framework.name</name>
-        <value>yarn</value>
-    </property>
-</configuration>
-+---+
-
-        etc/hadoop/yarn-site.xml:
-
-+---+
-<configuration>
-    <property>
-        <name>yarn.nodemanager.aux-services</name>
-        <value>mapreduce_shuffle</value>
-    </property>
-</configuration>
-+---+
-
-  [[2]] Start ResourceManager daemon and NodeManager daemon:
-
-----
-  $ sbin/start-yarn.sh
-----
-
-  [[3]] Browse the web interface for the ResourceManager; by default it is
-        available at:
-
-        * ResourceManager - <<<http://localhost:8088/>>>
-
-  [[4]] Run a MapReduce job.
-
-  [[5]] When you're done, stop the daemons with:
-
-----
-  $ sbin/stop-yarn.sh
-----
-
-* Fully-Distributed Operation
-
-  For information on setting up fully-distributed, non-trivial clusters
-  see {{{./ClusterSetup.html}Cluster Setup}}.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
deleted file mode 100644
index eb0c801..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
+++ /dev/null
@@ -1,24 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Single Node Setup
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Single Node Setup
-
-  This page will be removed in the next major release.
-
-  See {{{./SingleCluster.html}Single Cluster Setup}} to set up and configure a
-  single-node Hadoop installation.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
deleted file mode 100644
index 78ed9a4..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
+++ /dev/null
@@ -1,144 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Proxy user - Superusers Acting On Behalf Of Other Users
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Proxy user - Superusers Acting On Behalf Of Other Users
-
-%{toc|section=1|fromDepth=0}
-
-* Introduction
-
-   This document describes how a superuser can submit jobs or access hdfs
-   on behalf of another user.
-
-* Use Case
-
-   The code example described in the next section is applicable for the
-   following use case.
-
-   A superuser with username 'super' wants to submit job and access hdfs
-   on behalf of a user joe. The superuser has kerberos credentials but
-   user joe doesn't have any. The tasks are required to run as user joe
-   and any file accesses on namenode are required to be done as user joe.
-   It is required that user joe can connect to the namenode or job tracker
-   on a connection authenticated with super's kerberos credentials. In
-   other words super is impersonating the user joe.
-
-   Some products such as Apache Oozie need this.
-
-
-* Code example
-
-   In this example super's credentials are used for login and a
-   proxy user ugi object is created for joe. The operations are performed
-   within the doAs method of this proxy user ugi object.
-
-----
-    ...
-    //Create ugi for joe. The login user is 'super'.
-    UserGroupInformation ugi =
-            UserGroupInformation.createProxyUser("joe", UserGroupInformation.getLoginUser());
-    ugi.doAs(new PrivilegedExceptionAction<Void>() {
-      public Void run() throws Exception {
-        //Submit a job
-        JobClient jc = new JobClient(conf);
-        jc.submitJob(conf);
-        //OR access hdfs
-        FileSystem fs = FileSystem.get(conf);
-        fs.mkdir(someFilePath);
-      }
-    }
-----
-
-* Configurations
-
-   You can configure proxy user using properties
-   <<<hadoop.proxyuser.${superuser}.hosts>>> along with either or both of 
-   <<<hadoop.proxyuser.${superuser}.groups>>>
-   and <<<hadoop.proxyuser.${superuser}.users>>>.
-
-   By specifying as below in core-site.xml,
-   the superuser named <<<super>>> can connect
-   only from <<<host1>>> and <<<host2>>>
-   to impersonate a user belonging to <<<group1>>> and <<<group2>>>.
-
-----
-   <property>
-     <name>hadoop.proxyuser.super.hosts</name>
-     <value>host1,host2</value>
-   </property>
-   <property>
-     <name>hadoop.proxyuser.super.groups</name>
-     <value>group1,group2</value>
-   </property>
-
-----
-
-   If these configurations are not present, impersonation will not be
-   allowed and connection will fail.
-
-   If more lax security is preferred, the wildcard value * may be used to
-   allow impersonation from any host or of any user.
-   For example, by specifying as below in core-site.xml,
-   user named <<<oozie>>> accessing from any host
-   can impersonate any user belonging to any group.
-
-----
-  <property>
-    <name>hadoop.proxyuser.oozie.hosts</name>
-    <value>*</value>
-  </property>
-  <property>
-    <name>hadoop.proxyuser.oozie.groups</name>
-    <value>*</value>
-  </property>
-----
-
-   The <<<hadoop.proxyuser.${superuser}.hosts>>> accepts list of ip addresses,
-   ip address ranges in CIDR format and/or host names.
-   For example, by specifying as below,
-   user named <<<super>>> accessing from hosts in the range 
-   <<<10.222.0.0-15>>> and <<<10.113.221.221>>> can impersonate
-   <<<user1>>> and <<<user2>>>.
-      
-----
-   <property>
-     <name>hadoop.proxyuser.super.hosts</name>
-     <value>10.222.0.0/16,10.113.221.221</value>
-   </property>
-   <property>
-     <name>hadoop.proxyuser.super.users</name>
-     <value>user1,user2</value>
-   </property>
-----
-
-
-* Caveats
-
-   If the cluster is running in {{{./SecureMode.html}Secure Mode}},
-   the superuser must have kerberos credentials to be able to impersonate
-   another user.
-
-   It cannot use delegation tokens for this feature. It
-   would be wrong if superuser adds its own delegation token to the proxy
-   user ugi, as it will allow the proxy user to connect to the service
-   with the privileges of the superuser.
-
-   However, if the superuser does want to give a delegation token to joe,
-   it must first impersonate joe and get a delegation token for joe, in
-   the same way as the code example above, and add it to the ugi of joe.
-   In this way the delegation token will have the owner as joe.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm
deleted file mode 100644
index c51037b..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm
+++ /dev/null
@@ -1,233 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Distributed File System-${project.version} - Enabling Dapper-like Tracing
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Enabling Dapper-like Tracing in Hadoop
-
-%{toc|section=1|fromDepth=0}
-
-* {Dapper-like Tracing in Hadoop}
-
-** HTrace
-
-  {{{https://issues.apache.org/jira/browse/HDFS-5274}HDFS-5274}}
-  added support for tracing requests through HDFS,
-  using the open source tracing library, {{{https://git-wip-us.apache.org/repos/asf/incubator-htrace.git}Apache HTrace}}.
-  Setting up tracing is quite simple, however it requires some very minor changes to your client code.
-
-** Samplers
-  Configure the samplers in <<<core-site.xml>>> property: <<<hadoop.htrace.sampler>>>.
-  The value can be NeverSampler, AlwaysSampler or ProbabilitySampler. NeverSampler: HTrace is OFF 
-  for all spans; AlwaysSampler: HTrace is ON for all spans; ProbabilitySampler: HTrace is ON for 
-  some percentage% of top-level spans.
-
-+----
-  <property>
-    <name>hadoop.htrace.sampler</name>
-    <value>NeverSampler</value>
-  </property>
-+----
-
-** SpanReceivers
-
-  The tracing system works by collecting information in structs called 'Spans'.
-  It is up to you to choose how you want to receive this information
-  by implementing the SpanReceiver interface, which defines one method:
-
-+----
-public void receiveSpan(Span span);
-+----
-
-  Configure what SpanReceivers you'd like to use
-  by putting a comma separated list of the fully-qualified class name of
-  classes implementing SpanReceiver
-  in <<<core-site.xml>>> property: <<<hadoop.htrace.spanreceiver.classes>>>.
-
-+----
-  <property>
-    <name>hadoop.htrace.spanreceiver.classes</name>
-    <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
-  </property>
-  <property>
-    <name>hadoop.htrace.local-file-span-receiver.path</name>
-    <value>/var/log/hadoop/htrace.out</value>
-  </property>
-+----
-
-  You can omit package name prefix if you use span receiver bundled with HTrace.
-
-+----
-  <property>
-    <name>hadoop.htrace.spanreceiver.classes</name>
-    <value>LocalFileSpanReceiver</value>
-  </property>
-+----
-
-
-
-** Setting up ZipkinSpanReceiver
-
-  Instead of implementing SpanReceiver by yourself,
-  you can use <<<ZipkinSpanReceiver>>> which uses
-  {{{https://github.com/twitter/zipkin}Zipkin}}
-  for collecting and displaying tracing data.
-
-  In order to use <<<ZipkinSpanReceiver>>>,
-  you need to download and setup {{{https://github.com/twitter/zipkin}Zipkin}} first.
-
-  you also need to add the jar of <<<htrace-zipkin>>> to the classpath of Hadoop on each node.
-  Here is example setup procedure.
-
-+----
-  $ git clone https://github.com/cloudera/htrace
-  $ cd htrace/htrace-zipkin
-  $ mvn compile assembly:single
-  $ cp target/htrace-zipkin-*-jar-with-dependencies.jar $HADOOP_HOME/share/hadoop/common/lib/
-+----
-
-  The sample configuration for <<<ZipkinSpanReceiver>>> is shown below.
-  By adding these to <<<core-site.xml>>> of NameNode and DataNodes,
-  <<<ZipkinSpanReceiver>>> is initialized on the startup.
-  You also need this configuration on the client node in addition to the servers.
-
-+----
-  <property>
-    <name>hadoop.htrace.spanreceiver.classes</name>
-    <value>ZipkinSpanReceiver</value>
-  </property>
-  <property>
-    <name>hadoop.htrace.zipkin.collector-hostname</name>
-    <value>192.168.1.2</value>
-  </property>
-  <property>
-    <name>hadoop.htrace.zipkin.collector-port</name>
-    <value>9410</value>
-  </property>
-+----
-
-
-** Dynamic update of tracing configuration
-
-  You can use <<<hadoop trace>>> command to see and update the tracing configuration of each servers.
-  You must specify IPC server address of namenode or datanode by <<<-host>>> option.
-  You need to run the command against all servers if you want to update the configuration of all servers.
-
-  <<<hadoop trace -list>>> shows list of loaded span receivers associated with the id.
-
-+----
-  $ hadoop trace -list -host 192.168.56.2:9000
-  ID  CLASS
-  1   org.apache.htrace.impl.LocalFileSpanReceiver
-
-  $ hadoop trace -list -host 192.168.56.2:50020
-  ID  CLASS
-  1   org.apache.htrace.impl.LocalFileSpanReceiver
-+----
-
-  <<<hadoop trace -remove>>> removes span receiver from server.
-  <<<-remove>>> options takes id of span receiver as argument.
-
-+----
-  $ hadoop trace -remove 1 -host 192.168.56.2:9000
-  Removed trace span receiver 1
-+----
-
-  <<<hadoop trace -add>>> adds span receiver to server.
-  You need to specify the class name of span receiver as argument of <<<-class>>> option.
-  You can specify the configuration associated with span receiver by <<<-Ckey=value>>> options.
-
-+----
-  $ hadoop trace -add -class LocalFileSpanReceiver -Chadoop.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000
-  Added trace span receiver 2 with configuration hadoop.htrace.local-file-span-receiver.path = /tmp/htrace.out
-
-  $ hadoop trace -list -host 192.168.56.2:9000
-  ID  CLASS
-  2   org.apache.htrace.impl.LocalFileSpanReceiver
-+----
-
-
-** Starting tracing spans by HTrace API
-
-  In order to trace,
-  you will need to wrap the traced logic with <<tracing span>> as shown below.
-  When there is running tracing spans,
-  the tracing information is propagated to servers along with RPC requests.
-
-  In addition, you need to initialize <<<SpanReceiver>>> once per process.
-
-+----
-import org.apache.hadoop.hdfs.HdfsConfiguration;
-import org.apache.hadoop.tracing.SpanReceiverHost;
-import org.apache.htrace.Sampler;
-import org.apache.htrace.Trace;
-import org.apache.htrace.TraceScope;
-
-...
-
-    SpanReceiverHost.getInstance(new HdfsConfiguration());
-
-...
-
-    TraceScope ts = Trace.startSpan("Gets", Sampler.ALWAYS);
-    try {
-      ... // traced logic
-    } finally {
-      if (ts != null) ts.close();
-    }
-+----
-
-** Sample code for tracing
-
-  The <<<TracingFsShell.java>>> shown below is the wrapper of FsShell
-  which start tracing span before invoking HDFS shell command.
-
-+----
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FsShell;
-import org.apache.hadoop.tracing.SpanReceiverHost;
-import org.apache.hadoop.util.ToolRunner;
-import org.apache.htrace.Sampler;
-import org.apache.htrace.Trace;
-import org.apache.htrace.TraceScope;
-
-public class TracingFsShell {
-  public static void main(String argv[]) throws Exception {
-    Configuration conf = new Configuration();
-    FsShell shell = new FsShell();
-    conf.setQuietMode(false);
-    shell.setConf(conf);
-    SpanReceiverHost.getInstance(conf);
-    int res = 0;
-    TraceScope ts = null;
-    try {
-      ts = Trace.startSpan("FsShell", Sampler.ALWAYS);
-      res = ToolRunner.run(shell, argv);
-    } finally {
-      shell.close();
-      if (ts != null) ts.close();
-    }
-    System.exit(res);
-  }
-}
-+----
-
-  You can compile and execute this code as shown below.
-
-+----
-$ javac -cp `hadoop classpath` TracingFsShell.java
-$ java -cp .:`hadoop classpath` TracingFsShell -ls /
-+----

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm b/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm
new file mode 100644
index 0000000..25fecda
--- /dev/null
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm
@@ -0,0 +1,68 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop: CLI MiniCluster.
+========================
+
+* [Hadoop: CLI MiniCluster.](#Hadoop:_CLI_MiniCluster.)
+    * [Purpose](#Purpose)
+    * [Hadoop Tarball](#Hadoop_Tarball)
+    * [Running the MiniCluster](#Running_the_MiniCluster)
+
+Purpose
+-------
+
+Using the CLI MiniCluster, users can simply start and stop a single-node Hadoop cluster with a single command, and without the need to set any environment variables or manage configuration files. The CLI MiniCluster starts both a `YARN`/`MapReduce` & `HDFS` clusters.
+
+This is useful for cases where users want to quickly experiment with a real Hadoop cluster or test non-Java programs that rely on significant Hadoop functionality.
+
+Hadoop Tarball
+--------------
+
+You should be able to obtain the Hadoop tarball from the release. Also, you can directly create a tarball from the source:
+
+    $ mvn clean install -DskipTests
+    $ mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip
+
+**NOTE:** You will need [protoc 2.5.0](http://code.google.com/p/protobuf/) installed.
+
+The tarball should be available in `hadoop-dist/target/` directory.
+
+Running the MiniCluster
+-----------------------
+
+From inside the root directory of the extracted tarball, you can start the CLI MiniCluster using the following command:
+
+    $ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${project.version}-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT
+
+In the example command above, `RM_PORT` and `JHS_PORT` should be replaced by the user's choice of these port numbers. If not specified, random free ports will be used.
+
+There are a number of command line arguments that the users can use to control which services to start, and to pass other configuration properties. The available command line arguments:
+
+    $ -D <property=value>    Options to pass into configuration object
+    $ -datanodes <arg>       How many datanodes to start (default 1)
+    $ -format                Format the DFS (default false)
+    $ -help                  Prints option help.
+    $ -jhsport <arg>         JobHistoryServer port (default 0--we choose)
+    $ -namenode <arg>        URL of the namenode (default is either the DFS
+    $                        cluster or a temporary dir)
+    $ -nnport <arg>          NameNode port (default 0--we choose)
+    $ -nodemanagers <arg>    How many nodemanagers to start (default 1)
+    $ -nodfs                 Don't start a mini DFS cluster
+    $ -nomr                  Don't start a mini MR cluster
+    $ -rmport <arg>          ResourceManager port (default 0--we choose)
+    $ -writeConfig <path>    Save configuration to this XML file.
+    $ -writeDetails <path>   Write basic information to this JSON file.
+
+To display this full list of available arguments, the user can pass the `-help` argument to the above command.


Mime
View raw message