hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From whe...@apache.org
Subject [10/12] hadoop git commit: HADOOP-11633. Convert remaining branch-2 .apt.vm files to markdown. Contributed by Masatake Iwasaki.
Date Wed, 11 Mar 2015 21:31:29 GMT
http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-common-project/hadoop-kms/src/site/markdown/index.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-kms/src/site/markdown/index.md.vm b/hadoop-common-project/hadoop-kms/src/site/markdown/index.md.vm
new file mode 100644
index 0000000..44b5bfb
--- /dev/null
+++ b/hadoop-common-project/hadoop-kms/src/site/markdown/index.md.vm
@@ -0,0 +1,864 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+#set ( $H3 = '###' )
+#set ( $H4 = '####' )
+#set ( $H5 = '#####' )
+
+Hadoop Key Management Server (KMS) - Documentation Sets
+=======================================================
+
+Hadoop KMS is a cryptographic key management server based on Hadoop's **KeyProvider** API.
+
+It provides a client and a server components which communicate over HTTP using a REST API.
+
+The client is a KeyProvider implementation interacts with the KMS using the KMS HTTP REST API.
+
+KMS and its client have built-in security and they support HTTP SPNEGO Kerberos authentication and HTTPS secure transport.
+
+KMS is a Java web-application and it runs using a pre-configured Tomcat bundled with the Hadoop distribution.
+
+KMS Client Configuration
+------------------------
+
+The KMS client `KeyProvider` uses the **kms** scheme, and the embedded URL must be the URL of the KMS. For example, for a KMS running on `http://localhost:16000/kms`, the KeyProvider URI is `kms://http@localhost:16000/kms`. And, for a KMS running on `https://localhost:16000/kms`, the KeyProvider URI is `kms://https@localhost:16000/kms`
+
+KMS
+---
+
+$H3 KMS Configuration
+
+Configure the KMS backing KeyProvider properties in the `etc/hadoop/kms-site.xml` configuration file:
+
+```xml
+  <property>
+     <name>hadoop.kms.key.provider.uri</name>
+     <value>jceks://file@/${user.home}/kms.keystore</value>
+  </property>
+
+  <property>
+    <name>hadoop.security.keystore.java-keystore-provider.password-file</name>
+    <value>kms.keystore.password</value>
+  </property>
+```
+
+The password file is looked up in the Hadoop's configuration directory via the classpath.
+
+NOTE: You need to restart the KMS for the configuration changes to take effect.
+
+$H3 KMS Cache
+
+KMS caches keys for short period of time to avoid excessive hits to the underlying key provider.
+
+The Cache is enabled by default (can be dissabled by setting the `hadoop.kms.cache.enable` boolean property to false)
+
+The cache is used with the following 3 methods only, `getCurrentKey()` and `getKeyVersion()` and `getMetadata()`.
+
+For the `getCurrentKey()` method, cached entries are kept for a maximum of 30000 millisecond regardless the number of times the key is being access (to avoid stale keys to be considered current).
+
+For the `getKeyVersion()` method, cached entries are kept with a default inactivity timeout of 600000 milliseconds (10 mins). This time out is configurable via the following property in the `etc/hadoop/kms-site.xml` configuration file:
+
+```xml
+   <property>
+     <name>hadoop.kms.cache.enable</name>
+     <value>true</value>
+   </property>
+
+   <property>
+     <name>hadoop.kms.cache.timeout.ms</name>
+     <value>600000</value>
+   </property>
+
+   <property>
+     <name>hadoop.kms.current.key.cache.timeout.ms</name>
+     <value>30000</value>
+   </property>
+```
+
+$H3 KMS Aggregated Audit logs
+
+Audit logs are aggregated for API accesses to the GET\_KEY\_VERSION, GET\_CURRENT\_KEY, DECRYPT\_EEK, GENERATE\_EEK operations.
+
+Entries are grouped by the (user,key,operation) combined key for a configurable aggregation interval after which the number of accesses to the specified end-point by the user for a given key is flushed to the audit log.
+
+The Aggregation interval is configured via the property :
+
+      <property>
+        <name>hadoop.kms.aggregation.delay.ms</name>
+        <value>10000</value>
+      </property>
+
+$H3 Start/Stop the KMS
+
+To start/stop KMS use KMS's bin/kms.sh script. For example:
+
+    hadoop-${project.version} $ sbin/kms.sh start
+
+NOTE: Invoking the script without any parameters list all possible parameters (start, stop, run, etc.). The `kms.sh` script is a wrapper for Tomcat's `catalina.sh` script that sets the environment variables and Java System properties required to run KMS.
+
+$H3 Embedded Tomcat Configuration
+
+To configure the embedded Tomcat go to the `share/hadoop/kms/tomcat/conf`.
+
+KMS pre-configures the HTTP and Admin ports in Tomcat's `server.xml` to 16000 and 16001.
+
+Tomcat logs are also preconfigured to go to Hadoop's `logs/` directory.
+
+The following environment variables (which can be set in KMS's `etc/hadoop/kms-env.sh` script) can be used to alter those values:
+
+* KMS_HTTP_PORT
+* KMS_ADMIN_PORT
+* KMS_MAX_THREADS
+* KMS_LOGNOTE: You need to restart the KMS for the configuration changes to take effect.
+
+$H3 Loading native libraries
+
+The following environment variable (which can be set in KMS's `etc/hadoop/kms-env.sh` script) can be used to specify the location of any required native libraries. For eg. Tomact native Apache Portable Runtime (APR) libraries:
+
+* JAVA_LIBRARY_PATH
+
+$H3 KMS Security Configuration
+
+$H4 Enabling Kerberos HTTP SPNEGO Authentication
+
+Configure the Kerberos `etc/krb5.conf` file with the information of your KDC server.
+
+Create a service principal and its keytab for the KMS, it must be an `HTTP` service principal.
+
+Configure KMS `etc/hadoop/kms-site.xml` with the correct security values, for example:
+
+```xml
+   <property>
+     <name>hadoop.kms.authentication.type</name>
+     <value>kerberos</value>
+   </property>
+
+   <property>
+     <name>hadoop.kms.authentication.kerberos.keytab</name>
+     <value>${user.home}/kms.keytab</value>
+   </property>
+
+   <property>
+     <name>hadoop.kms.authentication.kerberos.principal</name>
+     <value>HTTP/localhost</value>
+   </property>
+
+   <property>
+     <name>hadoop.kms.authentication.kerberos.name.rules</name>
+     <value>DEFAULT</value>
+   </property>
+```
+
+NOTE: You need to restart the KMS for the configuration changes to take effect.
+
+$H4 KMS Proxyuser Configuration
+
+Each proxyuser must be configured in `etc/hadoop/kms-site.xml` using the following properties:
+
+```xml
+  <property>
+    <name>hadoop.kms.proxyuser.#USER#.users</name>
+    <value>*</value>
+  </property>
+
+  <property>
+    <name>hadoop.kms.proxyuser.#USER#.groups</name>
+    <value>*</value>
+  </property>
+
+  <property>
+    <name>hadoop.kms.proxyuser.#USER#.hosts</name>
+    <value>*</value>
+  </property>
+```
+
+`#USER#` is the username of the proxyuser to configure.
+
+The `users` property indicates the users that can be impersonated.
+
+The `groups` property indicates the groups users being impersonated must belong to.
+
+At least one of the `users` or `groups` properties must be defined. If both are specified, then the configured proxyuser will be able to impersonate and user in the `users` list and any user belonging to one of the groups in the `groups` list.
+
+The `hosts` property indicates from which host the proxyuser can make impersonation requests.
+
+If `users`, `groups` or `hosts` has a `*`, it means there are no restrictions for the proxyuser regarding users, groups or hosts.
+
+$H4 KMS over HTTPS (SSL)
+
+To configure KMS to work over HTTPS the following 2 properties must be set in the `etc/hadoop/kms_env.sh` script (shown with default values):
+
+* KMS_SSL_KEYSTORE_FILE=$HOME/.keystore
+* KMS_SSL_KEYSTORE_PASS=password
+
+In the KMS `tomcat/conf` directory, replace the `server.xml` file with the provided `ssl-server.xml` file.
+
+You need to create an SSL certificate for the KMS. As the `kms` Unix user, using the Java `keytool` command to create the SSL certificate:
+
+    $ keytool -genkey -alias tomcat -keyalg RSA
+
+You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `kms` user home directory.
+
+The password you enter for "keystore password" must match the value of the `KMS_SSL_KEYSTORE_PASS` environment variable set in the `kms-env.sh` script in the configuration directory.
+
+The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the KMS will be running.
+
+NOTE: You need to restart the KMS for the configuration changes to take effect.
+
+$H4 KMS Access Control
+
+KMS ACLs configuration are defined in the KMS `etc/hadoop/kms-acls.xml` configuration file. This file is hot-reloaded when it changes.
+
+KMS supports both fine grained access control as well as blacklist for kms operations via a set ACL configuration properties.
+
+A user accessing KMS is first checked for inclusion in the Access Control List for the requested operation and then checked for exclusion in the Black list for the operation before access is granted.
+
+```xml
+<configuration>
+  <property>
+    <name>hadoop.kms.acl.CREATE</name>
+    <value>*</value>
+    <description>
+          ACL for create-key operations.
+          If the user is not in the GET ACL, the key material is not returned
+          as part of the response.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.CREATE</name>
+    <value>hdfs,foo</value>
+    <description>
+          Blacklist for create-key operations.
+          If the user is in the Blacklist, the key material is not returned
+          as part of the response.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.DELETE</name>
+    <value>*</value>
+    <description>
+          ACL for delete-key operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.DELETE</name>
+    <value>hdfs,foo</value>
+    <description>
+          Blacklist for delete-key operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.ROLLOVER</name>
+    <value>*</value>
+    <description>
+          ACL for rollover-key operations.
+          If the user is not in the GET ACL, the key material is not returned
+          as part of the response.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.ROLLOVER</name>
+    <value>hdfs,foo</value>
+    <description>
+          Blacklist for rollover-key operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.GET</name>
+    <value>*</value>
+    <description>
+          ACL for get-key-version and get-current-key operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.GET</name>
+    <value>hdfs,foo</value>
+    <description>
+          ACL for get-key-version and get-current-key operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.GET_KEYS</name>
+    <value>*</value>
+    <description>
+         ACL for get-keys operation.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.GET_KEYS</name>
+    <value>hdfs,foo</value>
+    <description>
+          Blacklist for get-keys operation.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.GET_METADATA</name>
+    <value>*</value>
+    <description>
+        ACL for get-key-metadata and get-keys-metadata operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.GET_METADATA</name>
+    <value>hdfs,foo</value>
+    <description>
+         Blacklist for get-key-metadata and get-keys-metadata operations.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.SET_KEY_MATERIAL</name>
+    <value>*</value>
+    <description>
+            Complimentary ACL for CREATE and ROLLOVER operation to allow the client
+            to provide the key material when creating or rolling a key.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.SET_KEY_MATERIAL</name>
+    <value>hdfs,foo</value>
+    <description>
+            Complimentary Blacklist for CREATE and ROLLOVER operation to allow the client
+            to provide the key material when creating or rolling a key.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.GENERATE_EEK</name>
+    <value>*</value>
+    <description>
+          ACL for generateEncryptedKey
+          CryptoExtension operations
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.GENERATE_EEK</name>
+    <value>hdfs,foo</value>
+    <description>
+          Blacklist for generateEncryptedKey
+          CryptoExtension operations
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.acl.DECRYPT_EEK</name>
+    <value>*</value>
+    <description>
+          ACL for decrypt EncryptedKey
+          CryptoExtension operations
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.blacklist.DECRYPT_EEK</name>
+    <value>hdfs,foo</value>
+    <description>
+          Blacklist for decrypt EncryptedKey
+          CryptoExtension operations
+    </description>
+  </property>
+</configuration>
+```
+
+$H4 Key Access Control
+
+KMS supports access control for all non-read operations at the Key level. All Key Access operations are classified as :
+
+* MANAGEMENT - createKey, deleteKey, rolloverNewVersion
+* GENERATE_EEK - generateEncryptedKey, warmUpEncryptedKeys
+* DECRYPT_EEK - decryptEncryptedKey
+* READ - getKeyVersion, getKeyVersions, getMetadata, getKeysMetadata, getCurrentKey
+* ALL - all of the above
+
+These can be defined in the KMS `etc/hadoop/kms-acls.xml` as follows
+
+For all keys for which a key access has not been explicitly configured, It is possible to configure a default key access control for a subset of the operation types.
+
+It is also possible to configure a "whitelist" key ACL for a subset of the operation types. The whitelist key ACL is a whitelist in addition to the explicit or default per-key ACL. That is, if no per-key ACL is explicitly set, a user will be granted access if they are present in the default per-key ACL or the whitelist key ACL. If a per-key ACL is explicitly set, a user will be granted access if they are present in the per-key ACL or the whitelist key ACL.
+
+If no ACL is configured for a specific key AND no default ACL is configured AND no root key ACL is configured for the requested operation, then access will be DENIED.
+
+**NOTE:** The default and whitelist key ACL does not support `ALL` operation qualifier.
+
+```xml
+  <property>
+    <name>key.acl.testKey1.MANAGEMENT</name>
+    <value>*</value>
+    <description>
+      ACL for create-key, deleteKey and rolloverNewVersion operations.
+    </description>
+  </property>
+
+  <property>
+    <name>key.acl.testKey2.GENERATE_EEK</name>
+    <value>*</value>
+    <description>
+      ACL for generateEncryptedKey operations.
+    </description>
+  </property>
+
+  <property>
+    <name>key.acl.testKey3.DECRYPT_EEK</name>
+    <value>admink3</value>
+    <description>
+      ACL for decryptEncryptedKey operations.
+    </description>
+  </property>
+
+  <property>
+    <name>key.acl.testKey4.READ</name>
+    <value>*</value>
+    <description>
+      ACL for getKeyVersion, getKeyVersions, getMetadata, getKeysMetadata,
+      getCurrentKey operations
+    </description>
+  </property>
+
+  <property>
+    <name>key.acl.testKey5.ALL</name>
+    <value>*</value>
+    <description>
+      ACL for ALL operations.
+    </description>
+  </property>
+
+  <property>
+    <name>whitelist.key.acl.MANAGEMENT</name>
+    <value>admin1</value>
+    <description>
+      whitelist ACL for MANAGEMENT operations for all keys.
+    </description>
+  </property>
+
+  <!--
+  'testKey3' key ACL is defined. Since a 'whitelist'
+  key is also defined for DECRYPT_EEK, in addition to
+  admink3, admin1 can also perform DECRYPT_EEK operations
+  on 'testKey3'
+-->
+  <property>
+    <name>whitelist.key.acl.DECRYPT_EEK</name>
+    <value>admin1</value>
+    <description>
+      whitelist ACL for DECRYPT_EEK operations for all keys.
+    </description>
+  </property>
+
+  <property>
+    <name>default.key.acl.MANAGEMENT</name>
+    <value>user1,user2</value>
+    <description>
+      default ACL for MANAGEMENT operations for all keys that are not
+      explicitly defined.
+    </description>
+  </property>
+
+  <property>
+    <name>default.key.acl.GENERATE_EEK</name>
+    <value>user1,user2</value>
+    <description>
+      default ACL for GENERATE_EEK operations for all keys that are not
+      explicitly defined.
+    </description>
+  </property>
+
+  <property>
+    <name>default.key.acl.DECRYPT_EEK</name>
+    <value>user1,user2</value>
+    <description>
+      default ACL for DECRYPT_EEK operations for all keys that are not
+      explicitly defined.
+    </description>
+  </property>
+
+  <property>
+    <name>default.key.acl.READ</name>
+    <value>user1,user2</value>
+    <description>
+      default ACL for READ operations for all keys that are not
+      explicitly defined.
+    </description>
+  </property>
+```
+
+$H3 KMS Delegation Token Configuration
+
+KMS delegation token secret manager can be configured with the following properties:
+
+```xml
+  <property>
+    <name>hadoop.kms.authentication.delegation-token.update-interval.sec</name>
+    <value>86400</value>
+    <description>
+      How often the master key is rotated, in seconds. Default value 1 day.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.authentication.delegation-token.max-lifetime.sec</name>
+    <value>604800</value>
+    <description>
+      Maximum lifetime of a delagation token, in seconds. Default value 7 days.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.authentication.delegation-token.renew-interval.sec</name>
+    <value>86400</value>
+    <description>
+      Renewal interval of a delagation token, in seconds. Default value 1 day.
+    </description>
+  </property>
+
+  <property>
+    <name>hadoop.kms.authentication.delegation-token.removal-scan-interval.sec</name>
+    <value>3600</value>
+    <description>
+      Scan interval to remove expired delegation tokens.
+    </description>
+  </property>
+```
+
+$H3 Using Multiple Instances of KMS Behind a Load-Balancer or VIP
+
+KMS supports multiple KMS instances behind a load-balancer or VIP for scalability and for HA purposes.
+
+When using multiple KMS instances behind a load-balancer or VIP, requests from the same user may be handled by different KMS instances.
+
+KMS instances behind a load-balancer or VIP must be specially configured to work properly as a single logical service.
+
+$H4 HTTP Kerberos Principals Configuration
+
+When KMS instances are behind a load-balancer or VIP, clients will use the hostname of the VIP. For Kerberos SPNEGO authentication, the hostname of the URL is used to construct the Kerberos service name of the server, `HTTP/#HOSTNAME#`. This means that all KMS instances must have a Kerberos service name with the load-balancer or VIP hostname.
+
+In order to be able to access directly a specific KMS instance, the KMS instance must also have Keberos service name with its own hostname. This is required for monitoring and admin purposes.
+
+Both Kerberos service principal credentials (for the load-balancer/VIP hostname and for the actual KMS instance hostname) must be in the keytab file configured for authentication. And the principal name specified in the configuration must be '\*'. For example:
+
+```xml
+  <property>
+    <name>hadoop.kms.authentication.kerberos.principal</name>
+    <value>*</value>
+  </property>
+```
+
+**NOTE:** If using HTTPS, the SSL certificate used by the KMS instance must be configured to support multiple hostnames (see Java 7 `keytool` SAN extension support for details on how to do this).
+
+$H4 HTTP Authentication Signature
+
+KMS uses Hadoop Authentication for HTTP authentication. Hadoop Authentication issues a signed HTTP Cookie once the client has authenticated successfully. This HTTP Cookie has an expiration time, after which it will trigger a new authentication sequence. This is done to avoid triggering the authentication on every HTTP request of a client.
+
+A KMS instance must verify the HTTP Cookie signatures signed by other KMS instances. To do this all KMS instances must share the signing secret.
+
+This secret sharing can be done using a Zookeeper service which is configured in KMS with the following properties in the `kms-site.xml`:
+
+```xml
+  <property>
+    <name>hadoop.kms.authentication.signer.secret.provider</name>
+    <value>zookeeper</value>
+    <description>
+      Indicates how the secret to sign the authentication cookies will be
+      stored. Options are 'random' (default), 'string' and 'zookeeper'.
+      If using a setup with multiple KMS instances, 'zookeeper' should be used.
+    </description>
+  </property>
+  <property>
+    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.path</name>
+    <value>/hadoop-kms/hadoop-auth-signature-secret</value>
+    <description>
+      The Zookeeper ZNode path where the KMS instances will store and retrieve
+      the secret from.
+    </description>
+  </property>
+  <property>
+    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.connection.string</name>
+    <value>#HOSTNAME#:#PORT#,...</value>
+    <description>
+      The Zookeeper connection string, a list of hostnames and port comma
+      separated.
+    </description>
+  </property>
+  <property>
+    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.auth.type</name>
+    <value>kerberos</value>
+    <description>
+      The Zookeeper authentication type, 'none' or 'sasl' (Kerberos).
+    </description>
+  </property>
+  <property>
+    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.kerberos.keytab</name>
+    <value>/etc/hadoop/conf/kms.keytab</value>
+    <description>
+      The absolute path for the Kerberos keytab with the credentials to
+      connect to Zookeeper.
+    </description>
+  </property>
+  <property>
+    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.kerberos.principal</name>
+    <value>kms/#HOSTNAME#</value>
+    <description>
+      The Kerberos service principal used to connect to Zookeeper.
+    </description>
+  </property>
+```
+
+$H4 Delegation Tokens
+
+TBD
+
+$H3 KMS HTTP REST API
+
+$H4 Create a Key
+
+*REQUEST:*
+
+    POST http://HOST:PORT/kms/v1/keys
+    Content-Type: application/json
+
+    {
+      "name"        : "<key-name>",
+      "cipher"      : "<cipher>",
+      "length"      : <length>,        //int
+      "material"    : "<material>",    //base64
+      "description" : "<description>"
+    }
+
+*RESPONSE:*
+
+    201 CREATED
+    LOCATION: http://HOST:PORT/kms/v1/key/<key-name>
+    Content-Type: application/json
+
+    {
+      "name"        : "versionName",
+      "material"    : "<material>",    //base64, not present without GET ACL
+    }
+
+$H4 Rollover Key
+
+*REQUEST:*
+
+    POST http://HOST:PORT/kms/v1/key/<key-name>
+    Content-Type: application/json
+
+    {
+      "material"    : "<material>",
+    }
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    {
+      "name"        : "versionName",
+      "material"    : "<material>",    //base64, not present without GET ACL
+    }
+
+$H4 Delete Key
+
+*REQUEST:*
+
+    DELETE http://HOST:PORT/kms/v1/key/<key-name>
+
+*RESPONSE:*
+
+    200 OK
+
+$H4 Get Key Metadata
+
+*REQUEST:*
+
+    GET http://HOST:PORT/kms/v1/key/<key-name>/_metadata
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    {
+      "name"        : "<key-name>",
+      "cipher"      : "<cipher>",
+      "length"      : <length>,        //int
+      "description" : "<description>",
+      "created"     : <millis-epoc>,   //long
+      "versions"    : <versions>       //int
+    }
+
+$H4 Get Current Key
+
+*REQUEST:*
+
+    GET http://HOST:PORT/kms/v1/key/<key-name>/_currentversion
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    {
+      "name"        : "versionName",
+      "material"    : "<material>",    //base64
+    }
+
+$H4 Generate Encrypted Key for Current KeyVersion
+
+*REQUEST:*
+
+    GET http://HOST:PORT/kms/v1/key/<key-name>/_eek?eek_op=generate&num_keys=<number-of-keys-to-generate>
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+    [
+      {
+        "versionName"         : "encryptionVersionName",
+        "iv"                  : "<iv>",          //base64
+        "encryptedKeyVersion" : {
+            "versionName"       : "EEK",
+            "material"          : "<material>",    //base64
+        }
+      },
+      {
+        "versionName"         : "encryptionVersionName",
+        "iv"                  : "<iv>",          //base64
+        "encryptedKeyVersion" : {
+            "versionName"       : "EEK",
+            "material"          : "<material>",    //base64
+        }
+      },
+      ...
+    ]
+
+$H4 Decrypt Encrypted Key
+
+*REQUEST:*
+
+    POST http://HOST:PORT/kms/v1/keyversion/<version-name>/_eek?ee_op=decrypt
+    Content-Type: application/json
+
+    {
+      "name"        : "<key-name>",
+      "iv"          : "<iv>",          //base64
+      "material"    : "<material>",    //base64
+    }
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    {
+      "name"        : "EK",
+      "material"    : "<material>",    //base64
+    }
+
+$H4 Get Key Version
+
+*REQUEST:*
+
+    GET http://HOST:PORT/kms/v1/keyversion/<version-name>
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    {
+      "name"        : "versionName",
+      "material"    : "<material>",    //base64
+    }
+
+$H4 Get Key Versions
+
+*REQUEST:*
+
+    GET http://HOST:PORT/kms/v1/key/<key-name>/_versions
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    [
+      {
+        "name"        : "versionName",
+        "material"    : "<material>",    //base64
+      },
+      {
+        "name"        : "versionName",
+        "material"    : "<material>",    //base64
+      },
+      ...
+    ]
+
+$H4 Get Key Names
+
+*REQUEST:*
+
+    GET http://HOST:PORT/kms/v1/keys/names
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    [
+      "<key-name>",
+      "<key-name>",
+      ...
+    ]
+
+$H4 Get Keys Metadata
+
+    GET http://HOST:PORT/kms/v1/keys/metadata?key=<key-name>&key=<key-name>,...
+
+*RESPONSE:*
+
+    200 OK
+    Content-Type: application/json
+
+    [
+      {
+        "name"        : "<key-name>",
+        "cipher"      : "<cipher>",
+        "length"      : <length>,        //int
+        "description" : "<description>",
+        "created"     : <millis-epoc>,   //long
+        "versions"    : <versions>       //int
+      },
+      {
+        "name"        : "<key-name>",
+        "cipher"      : "<cipher>",
+        "length"      : <length>,        //int
+        "description" : "<description>",
+        "created"     : <millis-epoc>,   //long
+        "versions"    : <versions>       //int
+      },
+      ...
+    ]

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm
deleted file mode 100644
index 878ab1f..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm
+++ /dev/null
@@ -1,159 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License.
-
-  ---
-  Hadoop HDFS over HTTP ${project.version} - Server Setup
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop HDFS over HTTP ${project.version} - Server Setup
-
-  This page explains how to quickly setup HttpFS with Pseudo authentication
-  against a Hadoop cluster with Pseudo authentication.
-
-* Requirements
-
-    * Java 6+
-
-    * Maven 3+
-
-* Install HttpFS
-
-+---+
-~ $ tar xzf  httpfs-${project.version}.tar.gz
-+---+
-
-* Configure HttpFS
-
-  By default, HttpFS assumes that Hadoop configuration files
-  (<<<core-site.xml & hdfs-site.xml>>>) are in the HttpFS
-  configuration directory.
-
-  If this is not the case, add to the <<<httpfs-site.xml>>> file the
-  <<<httpfs.hadoop.config.dir>>> property set to the location
-  of the Hadoop configuration directory.
-
-* Configure Hadoop
-
-  Edit Hadoop <<<core-site.xml>>> and defined the Unix user that will
-  run the HttpFS server as a proxyuser. For example:
-
-+---+
-  ...
-  <property>
-    <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
-    <value>httpfs-host.foo.com</value>
-  </property>
-  <property>
-    <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
-    <value>*</value>
-  </property>
-  ...
-+---+
-
-  IMPORTANT: Replace <<<#HTTPFSUSER#>>> with the Unix user that will
-  start the HttpFS server.
-
-* Restart Hadoop
-
-  You need to restart Hadoop for the proxyuser configuration ot become
-  active.
-
-* Start/Stop HttpFS
-
-  To start/stop HttpFS use HttpFS's bin/httpfs.sh script. For example:
-
-+---+
-httpfs-${project.version} $ bin/httpfs.sh start
-+---+
-
-  NOTE: Invoking the script without any parameters list all possible
-  parameters (start, stop, run, etc.). The <<<httpfs.sh>>> script is a wrapper
-  for Tomcat's <<<catalina.sh>>> script that sets the environment variables
-  and Java System properties required to run HttpFS server.
-
-* Test HttpFS is working
-
-+---+
-~ $ curl -i "http://<HTTPFSHOSTNAME>:14000?user.name=babu&op=homedir"
-HTTP/1.1 200 OK
-Content-Type: application/json
-Transfer-Encoding: chunked
-
-{"homeDir":"http:\/\/<HTTPFS_HOST>:14000\/user\/babu"}
-+---+
-
-* Embedded Tomcat Configuration
-
-  To configure the embedded Tomcat go to the <<<tomcat/conf>>>.
-
-  HttpFS preconfigures the HTTP and Admin ports in Tomcat's <<<server.xml>>> to
-  14000 and 14001.
-
-  Tomcat logs are also preconfigured to go to HttpFS's <<<logs/>>> directory.
-
-  The following environment variables (which can be set in HttpFS's
-  <<<conf/httpfs-env.sh>>> script) can be used to alter those values:
-
-  * HTTPFS_HTTP_PORT
-
-  * HTTPFS_ADMIN_PORT
-
-  * HTTPFS_LOG
-
-* HttpFS Configuration
-
-  HttpFS supports the following {{{./httpfs-default.html}configuration properties}}
-  in the HttpFS's <<<conf/httpfs-site.xml>>> configuration file.
-
-* HttpFS over HTTPS (SSL)
-
-  To configure HttpFS to work over SSL edit the {{httpfs-env.sh}} script in the
-  configuration directory setting the {{HTTPFS_SSL_ENABLED}} to {{true}}.
-
-  In addition, the following 2 properties may be defined (shown with default
-  values):
-
-    * HTTPFS_SSL_KEYSTORE_FILE=${HOME}/.keystore
-
-    * HTTPFS_SSL_KEYSTORE_PASS=password
-
-  In the HttpFS <<<tomcat/conf>>> directory, replace the <<<server.xml>>> file
-  with the  <<<ssl-server.xml>>> file.
-
-
-  You need to create an SSL certificate for the HttpFS server. As the
-  <<<httpfs>>> Unix user, using the Java <<<keytool>>> command to create the
-  SSL certificate:
-
-+---+
-$ keytool -genkey -alias tomcat -keyalg RSA
-+---+
-
-  You will be asked a series of questions in an interactive prompt.  It will
-  create the keystore file, which will be named <<.keystore>> and located in the
-  <<<httpfs>>> user home directory.
-
-  The password you enter for "keystore password" must match the  value of the
-  <<<HTTPFS_SSL_KEYSTORE_PASS>>> environment variable set in the
-  <<<httpfs-env.sh>>> script in the configuration directory.
-
-  The answer to "What is your first and last name?" (i.e. "CN") must be the
-  hostname of the machine where the HttpFS Server will be running.
-
-  Start HttpFS. It should work over HTTPS.
-
-  Using the Hadoop <<<FileSystem>>> API or the Hadoop FS shell, use the
-  <<<swebhdfs://>>> scheme. Make sure the JVM is picking up the truststore
-  containing the public key of the SSL certificate if using a self-signed
-  certificate.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm
deleted file mode 100644
index c93e20b..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm
+++ /dev/null
@@ -1,87 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License.
-
-  ---
-  Hadoop HDFS over HTTP ${project.version} - Using HTTP Tools
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop HDFS over HTTP ${project.version} - Using HTTP Tools
-
-* Security
-
-  Out of the box HttpFS supports both pseudo authentication and Kerberos HTTP
-  SPNEGO authentication.
-
-** Pseudo Authentication
-
-  With pseudo authentication the user name must be specified in the
-  <<<user.name=\<USERNAME\>>>> query string parameter of a HttpFS URL.
-  For example:
-
-+---+
-$ curl "http://<HTTFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=babu"
-+---+
-
-** Kerberos HTTP SPNEGO Authentication
-
-  Kerberos HTTP SPNEGO authentication requires a tool or library supporting
-  Kerberos HTTP SPNEGO protocol.
-
-  IMPORTANT: If using <<<curl>>>, the <<<curl>>> version being used must support
-  GSS (<<<curl -V>>> prints out 'GSS' if it supports it).
-
-  For example:
-
-+---+
-$ kinit
-Please enter the password for tucu@LOCALHOST:
-$ curl --negotiate -u foo "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
-Enter host password for user 'foo':
-+---+
-
-  NOTE: the <<<-u USER>>> option is required by the <<<--negotiate>>> but it is
-  not used. Use any value as <<<USER>>> and when asked for the password press
-  [ENTER] as the password value is ignored.
-
-** {Remembering Who I Am} (Establishing an Authenticated Session)
-
-  As most authentication mechanisms, Hadoop HTTP authentication authenticates
-  users once and issues a short-lived authentication token to be presented in
-  subsequent requests. This authentication token is a signed HTTP Cookie.
-
-  When using tools like <<<curl>>>, the authentication token must be stored on
-  the first request doing authentication, and submitted in subsequent requests.
-  To do this with curl the <<<-b>>> and <<<-c>>> options to save and send HTTP
-  Cookies must be used.
-
-  For example, the first request doing authentication should save the received
-  HTTP Cookies.
-
-    Using Pseudo Authentication:
-
-+---+
-$ curl -c ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=babu"
-+---+
-
-    Using Kerberos HTTP SPNEGO authentication:
-
-+---+
-$ curl --negotiate -u foo -c ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
-+---+
-
-  Then, subsequent requests forward the previously received HTTP Cookie:
-
-+---+
-$ curl -b ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=liststatus"
-+---+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm
deleted file mode 100644
index f51e743..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm
+++ /dev/null
@@ -1,83 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License.
-
-  ---
-  Hadoop HDFS over HTTP - Documentation Sets ${project.version}
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop HDFS over HTTP - Documentation Sets ${project.version}
-
-  HttpFS is a server that provides a REST HTTP gateway supporting all HDFS
-  File System operations (read and write). And it is inteoperable with the
-  <<webhdfs>> REST HTTP API.
-
-  HttpFS can be used to transfer data between clusters running different
-  versions of Hadoop (overcoming RPC versioning issues), for example using
-  Hadoop DistCP.
-
-  HttpFS can be used to access data in HDFS on a cluster behind of a firewall
-  (the HttpFS server acts as a gateway and is the only system that is allowed
-  to cross the firewall into the cluster).
-
-  HttpFS can be used to access data in HDFS using HTTP utilities (such as curl
-  and wget) and HTTP libraries Perl from other languages than Java.
-
-  The <<webhdfs>> client FileSytem implementation can be used to access HttpFS
-  using the Hadoop filesystem command (<<<hadoop fs>>>) line tool as well as
-  from Java aplications using the Hadoop FileSystem Java API.
-
-  HttpFS has built-in security supporting Hadoop pseudo authentication and
-  HTTP SPNEGO Kerberos and other pluggable authentication mechanims. It also
-  provides Hadoop proxy user support.
-
-* How Does HttpFS Works?
-
-  HttpFS is a separate service from Hadoop NameNode.
-
-  HttpFS itself is Java web-application and it runs using a preconfigured Tomcat
-  bundled with HttpFS binary distribution.
-
-  HttpFS HTTP web-service API calls are HTTP REST calls that map to a HDFS file
-  system operation. For example, using the <<<curl>>> Unix command:
-
-  * <<<$ curl http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt>>> returns
-  the contents of the HDFS <<</user/foo/README.txt>>> file.
-
-  * <<<$ curl http://httpfs-host:14000/webhdfs/v1/user/foo?op=list>>> returns the
-  contents of the HDFS <<</user/foo>>> directory in JSON format.
-
-  * <<<$ curl -X POST http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=mkdirs>>>
-  creates the HDFS <<</user/foo.bar>>> directory.
-
-* How HttpFS and Hadoop HDFS Proxy differ?
-
-  HttpFS was inspired by Hadoop HDFS proxy.
-
-  HttpFS can be seen as a full rewrite of Hadoop HDFS proxy.
-
-  Hadoop HDFS proxy provides a subset of file system operations (read only),
-  HttpFS provides support for all file system operations.
-
-  HttpFS uses a clean HTTP REST API making its use with HTTP tools more
-  intuitive.
-
-  HttpFS supports Hadoop pseudo authentication, Kerberos SPNEGOS authentication
-  and Hadoop proxy users. Hadoop HDFS proxy did not.
-
-* User and Developer Documentation
-
-  * {{{./ServerSetup.html}HttpFS Server Setup}}
-
-  * {{{./UsingHttpTools.html}Using HTTP Tools}}
-

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm
new file mode 100644
index 0000000..3c7f9d3
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm
@@ -0,0 +1,121 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop HDFS over HTTP - Server Setup
+====================================
+
+This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.
+
+Install HttpFS
+--------------
+
+    ~ $ tar xzf  httpfs-${project.version}.tar.gz
+
+Configure HttpFS
+----------------
+
+By default, HttpFS assumes that Hadoop configuration files (`core-site.xml & hdfs-site.xml`) are in the HttpFS configuration directory.
+
+If this is not the case, add to the `httpfs-site.xml` file the `httpfs.hadoop.config.dir` property set to the location of the Hadoop configuration directory.
+
+Configure Hadoop
+----------------
+
+Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
+
+```xml
+  <property>
+    <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
+    <value>httpfs-host.foo.com</value>
+  </property>
+  <property>
+    <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
+    <value>*</value>
+  </property>
+```
+
+IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the HttpFS server.
+
+Restart Hadoop
+--------------
+
+You need to restart Hadoop for the proxyuser configuration ot become active.
+
+Start/Stop HttpFS
+-----------------
+
+To start/stop HttpFS use HttpFS's sbin/httpfs.sh script. For example:
+
+    $ sbin/httpfs.sh start
+
+NOTE: Invoking the script without any parameters list all possible parameters (start, stop, run, etc.). The `httpfs.sh` script is a wrapper for Tomcat's `catalina.sh` script that sets the environment variables and Java System properties required to run HttpFS server.
+
+Test HttpFS is working
+----------------------
+
+    ~ $ curl -i "http://<HTTPFSHOSTNAME>:14000?user.name=babu&op=homedir"
+    HTTP/1.1 200 OK
+    Content-Type: application/json
+    Transfer-Encoding: chunked
+
+    {"homeDir":"http:\/\/<HTTPFS_HOST>:14000\/user\/babu"}
+
+Embedded Tomcat Configuration
+-----------------------------
+
+To configure the embedded Tomcat go to the `tomcat/conf`.
+
+HttpFS preconfigures the HTTP and Admin ports in Tomcat's `server.xml` to 14000 and 14001.
+
+Tomcat logs are also preconfigured to go to HttpFS's `logs/` directory.
+
+The following environment variables (which can be set in HttpFS's `etc/hadoop/httpfs-env.sh` script) can be used to alter those values:
+
+* HTTPFS\_HTTP\_PORT
+
+* HTTPFS\_ADMIN\_PORT
+
+* HADOOP\_LOG\_DIR
+
+HttpFS Configuration
+--------------------
+
+HttpFS supports the following [configuration properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml` configuration file.
+
+HttpFS over HTTPS (SSL)
+-----------------------
+
+To configure HttpFS to work over SSL edit the [httpfs-env.sh](#httpfs-env.sh) script in the configuration directory setting the [HTTPFS\_SSL\_ENABLED](#HTTPFS_SSL_ENABLED) to [true](#true).
+
+In addition, the following 2 properties may be defined (shown with default values):
+
+* HTTPFS\_SSL\_KEYSTORE\_FILE=$HOME/.keystore
+
+* HTTPFS\_SSL\_KEYSTORE\_PASS=password
+
+In the HttpFS `tomcat/conf` directory, replace the `server.xml` file with the `ssl-server.xml` file.
+
+You need to create an SSL certificate for the HttpFS server. As the `httpfs` Unix user, using the Java `keytool` command to create the SSL certificate:
+
+    $ keytool -genkey -alias tomcat -keyalg RSA
+
+You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `httpfs` user home directory.
+
+The password you enter for "keystore password" must match the value of the `HTTPFS_SSL_KEYSTORE_PASS` environment variable set in the `httpfs-env.sh` script in the configuration directory.
+
+The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the HttpFS Server will be running.
+
+Start HttpFS. It should work over HTTPS.
+
+Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the `swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/UsingHttpTools.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/UsingHttpTools.md b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/UsingHttpTools.md
new file mode 100644
index 0000000..3045ad6
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/UsingHttpTools.md
@@ -0,0 +1,62 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop HDFS over HTTP - Using HTTP Tools
+========================================
+
+Security
+--------
+
+Out of the box HttpFS supports both pseudo authentication and Kerberos HTTP SPNEGO authentication.
+
+### Pseudo Authentication
+
+With pseudo authentication the user name must be specified in the `user.name=<USERNAME>` query string parameter of a HttpFS URL. For example:
+
+    $ curl "http://<HTTFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=babu"
+
+### Kerberos HTTP SPNEGO Authentication
+
+Kerberos HTTP SPNEGO authentication requires a tool or library supporting Kerberos HTTP SPNEGO protocol.
+
+IMPORTANT: If using `curl`, the `curl` version being used must support GSS (`curl -V` prints out 'GSS' if it supports it).
+
+For example:
+
+    $ kinit
+    Please enter the password for user@LOCALHOST:
+    $ curl --negotiate -u foo "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
+    Enter host password for user 'foo':
+
+NOTE: the `-u USER` option is required by the `--negotiate` but it is not used. Use any value as `USER` and when asked for the password press [ENTER] as the password value is ignored.
+
+### Remembering Who I Am (Establishing an Authenticated Session)
+
+As most authentication mechanisms, Hadoop HTTP authentication authenticates users once and issues a short-lived authentication token to be presented in subsequent requests. This authentication token is a signed HTTP Cookie.
+
+When using tools like `curl`, the authentication token must be stored on the first request doing authentication, and submitted in subsequent requests. To do this with curl the `-b` and `-c` options to save and send HTTP Cookies must be used.
+
+For example, the first request doing authentication should save the received HTTP Cookies.
+
+Using Pseudo Authentication:
+
+    $ curl -c ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=foo"
+
+Using Kerberos HTTP SPNEGO authentication:
+
+    $ curl --negotiate -u foo -c ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
+
+Then, subsequent requests forward the previously received HTTP Cookie:
+
+    $ curl -b ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=liststatus"

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/index.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/index.md b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/index.md
new file mode 100644
index 0000000..091558b
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/index.md
@@ -0,0 +1,50 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop HDFS over HTTP - Documentation Sets
+==========================================
+
+HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the **webhdfs** REST HTTP API.
+
+HttpFS can be used to transfer data between clusters running different versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop DistCP.
+
+HttpFS can be used to access data in HDFS on a cluster behind of a firewall (the HttpFS server acts as a gateway and is the only system that is allowed to cross the firewall into the cluster).
+
+HttpFS can be used to access data in HDFS using HTTP utilities (such as curl and wget) and HTTP libraries Perl from other languages than Java.
+
+The **webhdfs** client FileSytem implementation can be used to access HttpFS using the Hadoop filesystem command (`hadoop fs`) line tool as well as from Java aplications using the Hadoop FileSystem Java API.
+
+HttpFS has built-in security supporting Hadoop pseudo authentication and HTTP SPNEGO Kerberos and other pluggable authentication mechanims. It also provides Hadoop proxy user support.
+
+How Does HttpFS Works?
+----------------------
+
+HttpFS is a separate service from Hadoop NameNode.
+
+HttpFS itself is Java web-application and it runs using a preconfigured Tomcat bundled with HttpFS binary distribution.
+
+HttpFS HTTP web-service API calls are HTTP REST calls that map to a HDFS file system operation. For example, using the `curl` Unix command:
+
+* `$ curl http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt` returns the contents of the HDFS `/user/foo/README.txt` file.
+
+* `$ curl http://httpfs-host:14000/webhdfs/v1/user/foo?op=list` returns the contents of the HDFS `/user/foo` directory in JSON format.
+
+* `$ curl -X POST http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=mkdirs` creates the HDFS `/user/foo.bar` directory.
+
+User and Developer Documentation
+--------------------------------
+
+* [HttpFS Server Setup](./ServerSetup.html)
+
+* [Using HTTP Tools](./UsingHttpTools.html)

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm
deleted file mode 100644
index 2195e10..0000000
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm
+++ /dev/null
@@ -1,151 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Map Reduce Next Generation-${project.version} - Distributed Cache Deploy
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop MapReduce Next Generation - Distributed Cache Deploy
-
-* Introduction
-
-  The MapReduce application framework has rudimentary support for deploying a
-  new version of the MapReduce framework via the distributed cache. By setting
-  the appropriate configuration properties, users can run a different version
-  of MapReduce than the one initially deployed to the cluster. For example,
-  cluster administrators can place multiple versions of MapReduce in HDFS and
-  configure <<<mapred-site.xml>>> to specify which version jobs will use by
-  default. This allows the administrators to perform a rolling upgrade of the
-  MapReduce framework under certain conditions.
-
-* Preconditions and Limitations
-
-  The support for deploying the MapReduce framework via the distributed cache
-  currently does not address the job client code used to submit and query
-  jobs. It also does not address the <<<ShuffleHandler>>> code that runs as an
-  auxilliary service within each NodeManager. As a result the following
-  limitations apply to MapReduce versions that can be successfully deployed via
-  the distributed cache in a rolling upgrade fashion:
-
-  * The MapReduce version must be compatible with the job client code used to
-    submit and query jobs. If it is incompatible then the job client must be
-    upgraded separately on any node from which jobs using the new MapReduce
-    version will be submitted or queried.
-
-  * The MapReduce version must be compatible with the configuration files used
-    by the job client submitting the jobs. If it is incompatible with that
-    configuration (e.g.: a new property must be set or an existing property
-    value changed) then the configuration must be updated first.
-
-  * The MapReduce version must be compatible with the <<<ShuffleHandler>>>
-    version running on the nodes in the cluster. If it is incompatible then the
-    new <<<ShuffleHandler>>> code must be deployed to all the nodes in the
-    cluster, and the NodeManagers must be restarted to pick up the new
-    <<<ShuffleHandler>>> code.
-
-* Deploying a New MapReduce Version via the Distributed Cache
-
-  Deploying a new MapReduce version consists of three steps:
-
-  [[1]] Upload the MapReduce archive to a location that can be accessed by the
-  job submission client. Ideally the archive should be on the cluster's default
-  filesystem at a publicly-readable path. See the archive location discussion
-  below for more details.
-
-  [[2]] Configure <<<mapreduce.application.framework.path>>> to point to the
-  location where the archive is located. As when specifying distributed cache
-  files for a job, this is a URL that also supports creating an alias for the
-  archive if a URL fragment is specified. For example,
-  <<<hdfs:/mapred/framework/hadoop-mapreduce-${project.version}.tar.gz#mrframework>>>
-  will be localized as <<<mrframework>>> rather than
-  <<<hadoop-mapreduce-${project.version}.tar.gz>>>.
-
-  [[3]] Configure <<<mapreduce.application.classpath>>> to set the proper
-  classpath to use with the MapReduce archive configured above. NOTE: An error
-  occurs if <<<mapreduce.application.framework.path>>> is configured but
-  <<<mapreduce.application.classpath>>> does not reference the base name of the
-  archive path or the alias if an alias was specified.
-
-** Location of the MapReduce Archive and How It Affects Job Performance
-
-  Note that the location of the MapReduce archive can be critical to job
-  submission and job startup performance. If the archive is not located on the
-  cluster's default filesystem then it will be copied to the job staging
-  directory for each job and localized to each node where the job's tasks
-  run. This will slow down job submission and task startup performance.
-
-  If the archive is located on the default filesystem then the job client will
-  not upload the archive to the job staging directory for each job
-  submission. However if the archive path is not readable by all cluster users
-  then the archive will be localized separately for each user on each node
-  where tasks execute. This can cause unnecessary duplication in the
-  distributed cache.
-
-  When working with a large cluster it can be important to increase the
-  replication factor of the archive to increase its availability. This will
-  spread the load when the nodes in the cluster localize the archive for the
-  first time.
-
-* MapReduce Archives and Classpath Configuration
-
-  Setting a proper classpath for the MapReduce archive depends upon the
-  composition of the archive and whether it has any additional dependencies.
-  For example, the archive can contain not only the MapReduce jars but also the
-  necessary YARN, HDFS, and Hadoop Common jars and all other dependencies. In
-  that case, <<<mapreduce.application.classpath>>> would be configured to
-  something like the following example, where the archive basename is
-  hadoop-mapreduce-${project.version}.tar.gz and the archive is organized
-  internally similar to the standard Hadoop distribution archive:
-
-    <<<$HADOOP_CONF_DIR,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/lib/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/common/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/common/lib/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/yarn/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/yarn/lib/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/hdfs/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/hdfs/lib/*>>>
-
-  Another possible approach is to have the archive consist of just the
-  MapReduce jars and have the remaining dependencies picked up from the Hadoop
-  distribution installed on the nodes.  In that case, the above example would
-  change to something like the following:
-
-    <<<$HADOOP_CONF_DIR,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/lib/*,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*>>>
-
-** NOTE: 
-
-  If shuffle encryption is also enabled in the cluster, then we could meet the problem that MR job get failed with exception like below: 
-  
-+---+
-2014-10-10 02:17:16,600 WARN [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to junpingdu-centos5-3.cs1cloud.internal:13562 with 1 map outputs
-javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
-    at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
-    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
-    at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
-    at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
-    at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
-    at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
-    at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
-    at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
-    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
-    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
-    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
-    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
-    at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
-    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
-    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
-    at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
-    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
-    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
-    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
-    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
-....
-
-+---+
-
-  This is because MR client (deployed from HDFS) cannot access ssl-client.xml in local FS under directory of $HADOOP_CONF_DIR. To fix the problem, we can add the directory with ssl-client.xml to the classpath of MR which is specified in "mapreduce.application.classpath" as mentioned above. To avoid MR application being affected by other local configurations, it is better to create a dedicated directory for putting ssl-client.xml, e.g. a sub-directory under $HADOOP_CONF_DIR, like: $HADOOP_CONF_DIR/security.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/245f7b2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm
deleted file mode 100644
index 1761ad8..0000000
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm
+++ /dev/null
@@ -1,320 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Map Reduce Next Generation-${project.version} - Encrypted Shuffle
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Hadoop MapReduce Next Generation - Encrypted Shuffle
-
-* {Introduction}
-
-  The Encrypted Shuffle capability allows encryption of the MapReduce shuffle
-  using HTTPS and with optional client authentication (also known as
-  bi-directional HTTPS, or HTTPS with client certificates). It comprises:
-
-  * A Hadoop configuration setting for toggling the shuffle between HTTP and
-    HTTPS.
-
-  * A Hadoop configuration settings for specifying the keystore and truststore
-   properties (location, type, passwords) used by the shuffle service and the
-   reducers tasks fetching shuffle data.
-
-  * A way to re-load truststores across the cluster (when a node is added or
-    removed).
-
-* {Configuration}
-
-**  <<core-site.xml>> Properties
-
-  To enable encrypted shuffle, set the following properties in core-site.xml of
-  all nodes in the cluster:
-
-*--------------------------------------+---------------------+-----------------+
-| <<Property>>                         | <<Default Value>>   | <<Explanation>> |
-*--------------------------------------+---------------------+-----------------+
-| <<<hadoop.ssl.require.client.cert>>> | <<<false>>>         | Whether client certificates are required |
-*--------------------------------------+---------------------+-----------------+
-| <<<hadoop.ssl.hostname.verifier>>>   | <<<DEFAULT>>>       | The hostname verifier to provide for HttpsURLConnections. Valid values are: <<DEFAULT>>, <<STRICT>>, <<STRICT_I6>>, <<DEFAULT_AND_LOCALHOST>> and <<ALLOW_ALL>> |
-*--------------------------------------+---------------------+-----------------+
-| <<<hadoop.ssl.keystores.factory.class>>> | <<<org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory>>> | The KeyStoresFactory implementation to use |
-*--------------------------------------+---------------------+-----------------+
-| <<<hadoop.ssl.server.conf>>>         | <<<ssl-server.xml>>> | Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory |
-*--------------------------------------+---------------------+-----------------+
-| <<<hadoop.ssl.client.conf>>>         | <<<ssl-client.xml>>> | Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory |
-*--------------------------------------+---------------------+-----------------+
-| <<<hadoop.ssl.enabled.protocols>>>   | <<<TLSv1>>>         | The supported SSL protocols (JDK6 can use <<TLSv1>>, JDK7+ can use <<TLSv1,TLSv1.1,TLSv1.2>>) |
-*--------------------------------------+---------------------+-----------------+
-
-  <<IMPORTANT:>> Currently requiring client certificates should be set to false.
-  Refer the {{{ClientCertificates}Client Certificates}} section for details.
-
-  <<IMPORTANT:>> All these properties should be marked as final in the cluster
-  configuration files.
-
-*** Example:
-
-------
-    ...
-    <property>
-      <name>hadoop.ssl.require.client.cert</name>
-      <value>false</value>
-      <final>true</final>
-    </property>
-
-    <property>
-      <name>hadoop.ssl.hostname.verifier</name>
-      <value>DEFAULT</value>
-      <final>true</final>
-    </property>
-
-    <property>
-      <name>hadoop.ssl.keystores.factory.class</name>
-      <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
-      <final>true</final>
-    </property>
-
-    <property>
-      <name>hadoop.ssl.server.conf</name>
-      <value>ssl-server.xml</value>
-      <final>true</final>
-    </property>
-
-    <property>
-      <name>hadoop.ssl.client.conf</name>
-      <value>ssl-client.xml</value>
-      <final>true</final>
-    </property>
-    ...
-------
-
-**  <<<mapred-site.xml>>> Properties
-
-  To enable encrypted shuffle, set the following property in mapred-site.xml
-  of all nodes in the cluster:
-
-*--------------------------------------+---------------------+-----------------+
-| <<Property>>                         | <<Default Value>>   | <<Explanation>> |
-*--------------------------------------+---------------------+-----------------+
-| <<<mapreduce.shuffle.ssl.enabled>>>  | <<<false>>>         | Whether encrypted shuffle is enabled |
-*--------------------------------------+---------------------+-----------------+
-
-  <<IMPORTANT:>> This property should be marked as final in the cluster
-  configuration files.
-
-*** Example:
-
-------
-    ...
-    <property>
-      <name>mapreduce.shuffle.ssl.enabled</name>
-      <value>true</value>
-      <final>true</final>
-    </property>
-    ...
-------
-
-  The Linux container executor should be set to prevent job tasks from
-  reading the server keystore information and gaining access to the shuffle
-  server certificates.
-
-  Refer to Hadoop Kerberos configuration for details on how to do this.
-
-* {Keystore and Truststore Settings}
-
-  Currently <<<FileBasedKeyStoresFactory>>> is the only <<<KeyStoresFactory>>>
-  implementation. The <<<FileBasedKeyStoresFactory>>> implementation uses the
-  following properties, in the <<ssl-server.xml>> and <<ssl-client.xml>> files,
-  to configure the keystores and truststores.
-
-** <<<ssl-server.xml>>> (Shuffle server) Configuration:
-
-  The mapred user should own the <<ssl-server.xml>> file and have exclusive
-  read access to it.
-
-*---------------------------------------------+---------------------+-----------------+
-| <<Property>>                                | <<Default Value>>   | <<Explanation>> |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.keystore.type>>>              | <<<jks>>>           | Keystore file type |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.keystore.location>>>          | NONE                | Keystore file location. The mapred user should own this file and have exclusive read access to it. |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.keystore.password>>>          | NONE                | Keystore file password |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.truststore.type>>>            | <<<jks>>>           | Truststore file type |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.truststore.location>>>        | NONE                | Truststore file location. The mapred user should own this file and have exclusive read access to it. |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.truststore.password>>>        | NONE                | Truststore file password |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.server.truststore.reload.interval>>> | 10000               | Truststore reload interval, in milliseconds |
-*--------------------------------------+----------------------------+-----------------+
-
-*** Example:
-
-------
-<configuration>
-
-  <!-- Server Certificate Store -->
-  <property>
-    <name>ssl.server.keystore.type</name>
-    <value>jks</value>
-  </property>
-  <property>
-    <name>ssl.server.keystore.location</name>
-    <value>${user.home}/keystores/server-keystore.jks</value>
-  </property>
-  <property>
-    <name>ssl.server.keystore.password</name>
-    <value>serverfoo</value>
-  </property>
-
-  <!-- Server Trust Store -->
-  <property>
-    <name>ssl.server.truststore.type</name>
-    <value>jks</value>
-  </property>
-  <property>
-    <name>ssl.server.truststore.location</name>
-    <value>${user.home}/keystores/truststore.jks</value>
-  </property>
-  <property>
-    <name>ssl.server.truststore.password</name>
-    <value>clientserverbar</value>
-  </property>
-  <property>
-    <name>ssl.server.truststore.reload.interval</name>
-    <value>10000</value>
-  </property>
-</configuration>
-------
-
-** <<<ssl-client.xml>>> (Reducer/Fetcher) Configuration:
-
-  The mapred user should own the <<ssl-client.xml>> file and it should have
-  default permissions.
-
-*---------------------------------------------+---------------------+-----------------+
-| <<Property>>                                | <<Default Value>>   | <<Explanation>> |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.keystore.type>>>              | <<<jks>>>           | Keystore file type |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.keystore.location>>>          | NONE                | Keystore file location. The mapred user should own this file and it should have default permissions. |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.keystore.password>>>          | NONE                | Keystore file password |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.truststore.type>>>            | <<<jks>>>           | Truststore file type |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.truststore.location>>>        | NONE                | Truststore file location. The mapred user should own this file and it should have default permissions. |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.truststore.password>>>        | NONE                | Truststore file password |
-*---------------------------------------------+---------------------+-----------------+
-| <<<ssl.client.truststore.reload.interval>>> | 10000                | Truststore reload interval, in milliseconds |
-*--------------------------------------+----------------------------+-----------------+
-
-*** Example:
-
-------
-<configuration>
-
-  <!-- Client certificate Store -->
-  <property>
-    <name>ssl.client.keystore.type</name>
-    <value>jks</value>
-  </property>
-  <property>
-    <name>ssl.client.keystore.location</name>
-    <value>${user.home}/keystores/client-keystore.jks</value>
-  </property>
-  <property>
-    <name>ssl.client.keystore.password</name>
-    <value>clientfoo</value>
-  </property>
-
-  <!-- Client Trust Store -->
-  <property>
-    <name>ssl.client.truststore.type</name>
-    <value>jks</value>
-  </property>
-  <property>
-    <name>ssl.client.truststore.location</name>
-    <value>${user.home}/keystores/truststore.jks</value>
-  </property>
-  <property>
-    <name>ssl.client.truststore.password</name>
-    <value>clientserverbar</value>
-  </property>
-  <property>
-    <name>ssl.client.truststore.reload.interval</name>
-    <value>10000</value>
-  </property>
-</configuration>
-------
-
-* Activating Encrypted Shuffle
-
-  When you have made the above configuration changes, activate Encrypted
-  Shuffle by re-starting all NodeManagers.
-
-  <<IMPORTANT:>> Using encrypted shuffle will incur in a significant
-  performance impact. Users should profile this and potentially reserve
-  1 or more cores for encrypted shuffle.
-
-* {ClientCertificates} Client Certificates
-
-  Using Client Certificates does not fully ensure that the client is a
-  reducer task for the job. Currently, Client Certificates (their private key)
-  keystore files must be readable by all users submitting jobs to the cluster.
-  This means that a rogue job could read such those keystore files and use
-  the client certificates in them to establish a secure connection with a
-  Shuffle server. However, unless the rogue job has a proper JobToken, it won't
-  be able to retrieve shuffle data from the Shuffle server. A job, using its
-  own JobToken, can only retrieve shuffle data that belongs to itself.
-
-* Reloading Truststores
-
-  By default the truststores will reload their configuration every 10 seconds.
-  If a new truststore file is copied over the old one, it will be re-read,
-  and its certificates will replace the old ones. This mechanism is useful for
-  adding or removing nodes from the cluster, or for adding or removing trusted
-  clients. In these cases, the client or NodeManager certificate is added to
-  (or removed from) all the truststore files in the system, and the new
-  configuration will be picked up without you having to restart the NodeManager
-  daemons.
-
-* Debugging
-
-  <<NOTE:>> Enable debugging only for troubleshooting, and then only for jobs
-  running on small amounts of data. It is very verbose and slows down jobs by
-  several orders of magnitude. (You might need to increase mapred.task.timeout
-  to prevent jobs from failing because tasks run so slowly.)
-
-  To enable SSL debugging in the reducers, set <<<-Djavax.net.debug=all>>> in
-  the <<<mapreduce.reduce.child.java.opts>>> property; for example:
-
-------
-  <property>
-    <name>mapred.reduce.child.java.opts</name>
-    <value>-Xmx-200m -Djavax.net.debug=all</value>
-  </property>
-------
-
-  You can do this on a per-job basis, or by means of a cluster-wide setting in
-  the <<<mapred-site.xml>>> file.
-
-  To set this property in NodeManager, set it in the <<<yarn-env.sh>>> file:
-
-------
-  YARN_NODEMANAGER_OPTS="-Djavax.net.debug=all $YARN_NODEMANAGER_OPTS"
-------


Mime
View raw message