hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cmcc...@apache.org
Subject [05/11] hadoop git commit: HDFS-7668. Backport "Convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe)
Date Wed, 25 Feb 2015 00:34:35 GMT
http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ArchivalStorage.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ArchivalStorage.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ArchivalStorage.md
new file mode 100644
index 0000000..2038401
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ArchivalStorage.md
@@ -0,0 +1,160 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Archival Storage, SSD & Memory
+==============================
+
+* [Archival Storage, SSD & Memory](#Archival_Storage_SSD__Memory)
+    * [Introduction](#Introduction)
+    * [Storage Types and Storage Policies](#Storage_Types_and_Storage_Policies)
+        * [Storage Types: ARCHIVE, DISK, SSD and RAM\_DISK](#Storage_Types:_ARCHIVE_DISK_SSD_and_RAM_DISK)
+        * [Storage Policies: Hot, Warm, Cold, All\_SSD, One\_SSD and Lazy\_Persist](#Storage_Policies:_Hot_Warm_Cold_All_SSD_One_SSD_and_Lazy_Persist)
+        * [Storage Policy Resolution](#Storage_Policy_Resolution)
+        * [Configuration](#Configuration)
+    * [Mover - A New Data Migration Tool](#Mover_-_A_New_Data_Migration_Tool)
+    * [Storage Policy Commands](#Storage_Policy_Commands)
+        * [List Storage Policies](#List_Storage_Policies)
+        * [Set Storage Policy](#Set_Storage_Policy)
+        * [Get Storage Policy](#Get_Storage_Policy)
+
+Introduction
+------------
+
+*Archival Storage* is a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot can be moved to the cold. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster.
+
+The frameworks provided by Heterogeneous Storage and Archival Storage generalizes the HDFS architecture to include other kinds of storage media including *SSD* and *memory*. Users may choose to store their data in SSD or memory for a better performance.
+
+Storage Types and Storage Policies
+----------------------------------
+
+### Storage Types: ARCHIVE, DISK, SSD and RAM\_DISK
+
+The first phase of [Heterogeneous Storage (HDFS-2832)](https://issues.apache.org/jira/browse/HDFS-2832) changed datanode storage model from a single storage, which may correspond to multiple physical storage medias, to a collection of storages with each storage corresponding to a physical storage media. It also added the notion of storage types, DISK and SSD, where DISK is the default storage type.
+
+A new storage type *ARCHIVE*, which has high storage density (petabyte of storage) but little compute power, is added for supporting archival storage.
+
+Another new storage type *RAM\_DISK* is added for supporting writing single replica files in memory.
+
+### Storage Policies: Hot, Warm, Cold, All\_SSD, One\_SSD and Lazy\_Persist
+
+A new concept of storage policies is introduced in order to allow files to be stored in different storage types according to the storage policy.
+
+We have the following storage policies:
+
+* **Hot** - for both storage and compute. The data that is popular and still being used for processing will stay in this policy. When a block is hot, all replicas are stored in DISK.
+* **Cold** - only for storage with limited compute. The data that is no longer being used, or data that needs to be archived is moved from hot storage to cold storage. When a block is cold, all replicas are stored in ARCHIVE.
+* **Warm** - partially hot and partially cold. When a block is warm, some of its replicas are stored in DISK and the remaining replicas are stored in ARCHIVE.
+* **All\_SSD** - for storing all replicas in SSD.
+* **One\_SSD** - for storing one of the replicas in SSD. The remaining replicas are stored in DISK.
+* **Lazy\_Persist** - for writing blocks with single replica in memory. The replica is first written in RAM\_DISK and then it is lazily persisted in DISK.
+
+More formally, a storage policy consists of the following fields:
+
+1.  Policy ID
+2.  Policy name
+3.  A list of storage types for block placement
+4.  A list of fallback storage types for file creation
+5.  A list of fallback storage types for replication
+
+When there is enough space, block replicas are stored according to the storage type list specified in \#3. When some of the storage types in list \#3 are running out of space, the fallback storage type lists specified in \#4 and \#5 are used to replace the out-of-space storage types for file creation and replication, respectively.
+
+The following is a typical storage policy table.
+
+| **Policy** **ID** | **Policy** **Name** | **Block Placement** **(n  replicas)** | **Fallback storages** **for creation** | **Fallback storages** **for replication** |
+|:---- |:---- |:---- |:---- |:---- |
+| 15 | Lasy\_Persist | RAM\_DISK: 1, DISK: *n*-1 | DISK | DISK |
+| 12 | All\_SSD | SSD: *n* | DISK | DISK |
+| 10 | One\_SSD | SSD: 1, DISK: *n*-1 | SSD, DISK | SSD, DISK |
+| 7 | Hot (default) | DISK: *n* | \<none\> | ARCHIVE |
+| 5 | Warm | DISK: 1, ARCHIVE: *n*-1 | ARCHIVE, DISK | ARCHIVE, DISK |
+| 2 | Cold | ARCHIVE: *n* | \<none\> | \<none\> |
+
+Note that the Lasy\_Persist policy is useful only for single replica blocks. For blocks with more than one replicas, all the replicas will be written to DISK since writing only one of the replicas to RAM\_DISK does not improve the overall performance.
+
+### Storage Policy Resolution
+
+When a file or directory is created, its storage policy is *unspecified*. The storage policy can be specified using the "[`dfsadmin -setStoragePolicy`](#Set_Storage_Policy)" command. The effective storage policy of a file or directory is resolved by the following rules.
+
+1.  If the file or directory is specificed with a storage policy, return it.
+
+2.  For an unspecified file or directory, if it is the root directory, return the *default storage policy*. Otherwise, return its parent's effective storage policy.
+
+The effective storage policy can be retrieved by the "[`dfsadmin -getStoragePolicy`](#Get_Storage_Policy)" command.
+
+### Configuration
+
+* **dfs.storage.policy.enabled** - for enabling/disabling the storage policy feature. The default value is `true`.
+
+Mover - A New Data Migration Tool
+---------------------------------
+
+A new data migration tool is added for archiving data. The tool is similar to Balancer. It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. For the blocks violating the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirement.
+
+* Command:
+
+        hdfs mover [-p <files/dirs> | -f <local file name>]
+
+* Arguments:
+
+| | |
+|:---- |:---- |
+| `-p <files/dirs>` | Specify a space separated list of HDFS files/dirs to migrate. |
+| `-f <local file>` | Specify a local file containing a list of HDFS files/dirs to migrate. |
+
+Note that, when both -p and -f options are omitted, the default path is the root directory.
+
+Storage Policy Commands
+-----------------------
+
+### List Storage Policies
+
+List out all the storage policies.
+
+* Command:
+
+        hdfs storagepolicies -listPolicies
+
+* Arguments: none.
+
+### Set Storage Policy
+
+Set a storage policy to a file or a directory.
+
+* Command:
+
+        hdfs storagepolicies -setStoragePolicy -path <path> -policy <policy>
+
+* Arguments:
+
+| | |
+|:---- |:---- |
+| `-path <path>` | The path referring to either a directory or a file. |
+| `-policy <policy>` | The name of the storage policy. |
+
+### Get Storage Policy
+
+Get the storage policy of a file or a directory.
+
+* Command:
+
+        hdfs storagepolicies -getStoragePolicy -path <path>
+
+* Arguments:
+
+| | |
+|:---- |:---- |
+| `-path <path>` | The path referring to either a directory or a file. |
+
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
new file mode 100644
index 0000000..b4f08c8
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
@@ -0,0 +1,268 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Centralized Cache Management in HDFS
+====================================
+
+* [Overview](#Overview)
+* [Use Cases](#Use_Cases)
+* [Architecture](#Architecture)
+* [Concepts](#Concepts)
+    * [Cache directive](#Cache_directive)
+    * [Cache pool](#Cache_pool)
+* [cacheadmin command-line interface](#cacheadmin_command-line_interface)
+    * [Cache directive commands](#Cache_directive_commands)
+        * [addDirective](#addDirective)
+        * [removeDirective](#removeDirective)
+        * [removeDirectives](#removeDirectives)
+        * [listDirectives](#listDirectives)
+    * [Cache pool commands](#Cache_pool_commands)
+        * [addPool](#addPool)
+        * [modifyPool](#modifyPool)
+        * [removePool](#removePool)
+        * [listPools](#listPools)
+        * [help](#help)
+* [Configuration](#Configuration)
+    * [Native Libraries](#Native_Libraries)
+    * [Configuration Properties](#Configuration_Properties)
+        * [Required](#Required)
+        * [Optional](#Optional)
+  * [OS Limits](#OS_Limits)
+
+Overview
+--------
+
+*Centralized cache management* in HDFS is an explicit caching mechanism that allows users to specify *paths* to be cached by HDFS. The NameNode will communicate with DataNodes that have the desired blocks on disk, and instruct them to cache the blocks in off-heap caches.
+
+Centralized cache management in HDFS has many significant advantages.
+
+1.  Explicit pinning prevents frequently used data from being evicted from memory. This is particularly important when the size of the working set exceeds the size of main memory, which is common for many HDFS workloads.
+
+2.  Because DataNode caches are managed by the NameNode, applications can query the set of cached block locations when making task placement decisions. Co-locating a task with a cached block replica improves read performance.
+
+3.  When block has been cached by a DataNode, clients can use a new , more-efficient, zero-copy read API. Since checksum verification of cached data is done once by the DataNode, clients can incur essentially zero overhead when using this new API.
+
+4.  Centralized caching can improve overall cluster memory utilization. When relying on the OS buffer cache at each DataNode, repeated reads of a block will result in all *n* replicas of the block being pulled into buffer cache. With centralized cache management, a user can explicitly pin only *m* of the *n* replicas, saving *n-m* memory.
+
+Use Cases
+---------
+
+Centralized cache management is useful for files that accessed repeatedly. For example, a small *fact table* in Hive which is often used for joins is a good candidate for caching. On the other hand, caching the input of a *one year reporting query* is probably less useful, since the historical data might only be read once.
+
+Centralized cache management is also useful for mixed workloads with performance SLAs. Caching the working set of a high-priority workload insures that it does not contend for disk I/O with a low-priority workload.
+
+Architecture
+------------
+
+![Caching Architecture](images/caching.png)
+
+In this architecture, the NameNode is responsible for coordinating all the DataNode off-heap caches in the cluster. The NameNode periodically receives a *cache report* from each DataNode which describes all the blocks cached on a given DN. The NameNode manages DataNode caches by piggybacking cache and uncache commands on the DataNode heartbeat.
+
+The NameNode queries its set of *cache directives* to determine which paths should be cached. Cache directives are persistently stored in the fsimage and edit log, and can be added, removed, and modified via Java and command-line APIs. The NameNode also stores a set of *cache pools*, which are administrative entities used to group cache directives together for resource management and enforcing permissions.
+
+The NameNode periodically rescans the namespace and active cache directives to determine which blocks need to be cached or uncached and assign caching work to DataNodes. Rescans can also be triggered by user actions like adding or removing a cache directive or removing a cache pool.
+
+We do not currently cache blocks which are under construction, corrupt, or otherwise incomplete. If a cache directive covers a symlink, the symlink target is not cached.
+
+Caching is currently done on the file or directory-level. Block and sub-block caching is an item of future work.
+
+Concepts
+--------
+
+### Cache directive
+
+A *cache directive* defines a path that should be cached. Paths can be either directories or files. Directories are cached non-recursively, meaning only files in the first-level listing of the directory.
+
+Directives also specify additional parameters, such as the cache replication factor and expiration time. The replication factor specifies the number of block replicas to cache. If multiple cache directives refer to the same file, the maximum cache replication factor is applied.
+
+The expiration time is specified on the command line as a *time-to-live (TTL)*, a relative expiration time in the future. After a cache directive expires, it is no longer considered by the NameNode when making caching decisions.
+
+### Cache pool
+
+A *cache pool* is an administrative entity used to manage groups of cache directives. Cache pools have UNIX-like *permissions*, which restrict which users and groups have access to the pool. Write permissions allow users to add and remove cache directives to the pool. Read permissions allow users to list the cache directives in a pool, as well as additional metadata. Execute permissions are unused.
+
+Cache pools are also used for resource management. Pools can enforce a maximum *limit*, which restricts the number of bytes that can be cached in aggregate by directives in the pool. Normally, the sum of the pool limits will approximately equal the amount of aggregate memory reserved for HDFS caching on the cluster. Cache pools also track a number of statistics to help cluster users determine what is and should be cached.
+
+Pools also can enforce a maximum time-to-live. This restricts the maximum expiration time of directives being added to the pool.
+
+`cacheadmin` command-line interface
+-----------------------------------
+
+On the command-line, administrators and users can interact with cache pools and directives via the `hdfs cacheadmin` subcommand.
+
+Cache directives are identified by a unique, non-repeating 64-bit integer ID. IDs will not be reused even if a cache directive is later removed.
+
+Cache pools are identified by a unique string name.
+
+### Cache directive commands
+
+#### addDirective
+
+Usage: `hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]`
+
+Add a new cache directive.
+
+| | |
+|:---- |:---- |
+| \<path\> | A path to cache. The path can be a directory or a file. |
+| \<pool-name\> | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives. |
+| -force | Skips checking of cache pool resource limits. |
+| \<replication\> | The cache replication factor to use. Defaults to 1. |
+| \<time-to-live\> | How long the directive is valid. Can be specified in minutes, hours, and days, e.g. 30m, 4h, 2d. Valid units are [smhd]. "never" indicates a directive that never expires. If unspecified, the directive never expires. |
+
+#### removeDirective
+
+Usage: `hdfs cacheadmin -removeDirective <id> `
+
+Remove a cache directive.
+
+| | |
+|:---- |:---- |
+| \<id\> | The id of the cache directive to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cachedirective IDs, use the -listDirectives command. |
+
+#### removeDirectives
+
+Usage: `hdfs cacheadmin -removeDirectives <path> `
+
+Remove every cache directive with the specified path.
+
+| | |
+|:---- |:---- |
+| \<path\> | The path of the cache directives to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cache directives, use the -listDirectives command. |
+
+#### listDirectives
+
+Usage: `hdfs cacheadmin -listDirectives [-stats] [-path <path>] [-pool <pool>]`
+
+List cache directives.
+
+| | |
+|:---- |:---- |
+| \<path\> | List only cache directives with this path. Note that if there is a cache directive for *path* in a cache pool that we don't have read access for, it will not be listed. |
+| \<pool\> | List only path cache directives in that pool. |
+| -stats | List path-based cache directive statistics. |
+
+### Cache pool commands
+
+#### addPool
+
+Usage: `hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl`\>
+
+Add a new cache pool.
+
+| | |
+|:---- |:---- |
+| \<name\> | Name of the new pool. |
+| \<owner\> | Username of the owner of the pool. Defaults to the current user. |
+| \<group\> | Group of the pool. Defaults to the primary group name of the current user. |
+| \<mode\> | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755. |
+| \<limit\> | The maximum number of bytes that can be cached by directives in this pool, in aggregate. By default, no limit is set. |
+| \<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool. This can be specified in seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. Valid units are [smhd]. By default, no maximum is set. A value of  "never " specifies that there is no limit. |
+
+#### modifyPool
+
+Usage: `hdfs cacheadmin -modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]`
+
+Modifies the metadata of an existing cache pool.
+
+| | |
+|:---- |:---- |
+| \<name\> | Name of the pool to modify. |
+| \<owner\> | Username of the owner of the pool. |
+| \<group\> | Groupname of the group of the pool. |
+| \<mode\> | Unix-style permissions of the pool in octal. |
+| \<limit\> | Maximum number of bytes that can be cached by this pool. |
+| \<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool. |
+
+#### removePool
+
+Usage: `hdfs cacheadmin -removePool <name> `
+
+Remove a cache pool. This also uncaches paths associated with the pool.
+
+| | |
+|:---- |:---- |
+| \<name\> | Name of the cache pool to remove. |
+
+#### listPools
+
+Usage: `hdfs cacheadmin -listPools [-stats] [<name>]`
+
+Display information about one or more cache pools, e.g. name, owner, group, permissions, etc.
+
+| | |
+|:---- |:---- |
+| -stats | Display additional cache pool statistics. |
+| \<name\> | If specified, list only the named cache pool. |
+
+#### help
+
+Usage: `hdfs cacheadmin -help <command-name> `
+
+Get detailed help about a command.
+
+| | |
+|:---- |:---- |
+| \<command-name\> | The command for which to get detailed help. If no command is specified, print detailed help for all commands. |
+
+Configuration
+-------------
+
+### Native Libraries
+
+In order to lock block files into memory, the DataNode relies on native JNI code found in `libhadoop.so` or `hadoop.dll` on Windows. Be sure to [enable JNI](../hadoop-common/NativeLibraries.html) if you are using HDFS centralized cache management.
+
+### Configuration Properties
+
+#### Required
+
+Be sure to configure the following:
+
+*   dfs.datanode.max.locked.memory
+
+    This determines the maximum amount of memory a DataNode will use for caching. On Unix-like systems, the "locked-in-memory size" ulimit (`ulimit -l`) of the DataNode user also needs to be increased to match this parameter (see below section on [OS Limits](#OS_Limits)). When setting this value, please remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps and the operating system page cache.
+
+#### Optional
+
+The following properties are not required, but may be specified for tuning:
+
+*   dfs.namenode.path.based.cache.refresh.interval.ms
+
+    The NameNode will use this as the amount of milliseconds between subsequent path cache rescans. This calculates the blocks to cache and each DataNode containing a replica of the block that should cache it.
+
+    By default, this parameter is set to 300000, which is five minutes.
+
+*   dfs.datanode.fsdatasetcache.max.threads.per.volume
+
+    The DataNode will use this as the maximum number of threads per volume to use for caching new data.
+
+    By default, this parameter is set to 4.
+
+*   dfs.cachereport.intervalMsec
+
+    The DataNode will use this as the amount of milliseconds between sending a full report of its cache state to the NameNode.
+
+    By default, this parameter is set to 10000, which is 10 seconds.
+
+*   dfs.namenode.path.based.cache.block.map.allocation.percent
+
+    The percentage of the Java heap which we will allocate to the cached blocks map. The cached blocks map is a hash map which uses chained hashing. Smaller maps may be accessed more slowly if the number of cached blocks is large; larger maps will consume more memory. The default is 0.25 percent.
+
+### OS Limits
+
+If you get the error "Cannot start datanode because the configured max locked memory size... is more than the datanode's available RLIMIT\_MEMLOCK ulimit," that means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with. Usually, this value is configured in `/etc/security/limits.conf`. However, it will vary depending on what operating system and distribution you are using.
+
+You will know that you have correctly configured this value when you can run `ulimit -l` from the shell and get back either a higher value than what you have configured with `dfs.datanode.max.locked.memory`, or the string "unlimited," indicating that there is no limit. Note that it's typical for `ulimit -l` to output the memory lock limit in KB, but dfs.datanode.max.locked.memory must be specified in bytes.
+
+This information does not apply to deployments on Windows. Windows has no direct equivalent of `ulimit -l`.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ExtendedAttributes.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ExtendedAttributes.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ExtendedAttributes.md
new file mode 100644
index 0000000..5a20986
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ExtendedAttributes.md
@@ -0,0 +1,98 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Extended Attributes in HDFS
+===========================
+
+* [Overview](#Overview)
+    * [HDFS extended attributes](#HDFS_extended_attributes)
+    * [Namespaces and Permissions](#Namespaces_and_Permissions)
+* [Interacting with extended attributes](#Interacting_with_extended_attributes)
+    * [getfattr](#getfattr)
+    * [setfattr](#setfattr)
+* [Configuration options](#Configuration_options)
+
+Overview
+--------
+
+*Extended attributes* (abbreviated as *xattrs*) are a filesystem feature that allow user applications to associate additional metadata with a file or directory. Unlike system-level inode metadata such as file permissions or modification time, extended attributes are not interpreted by the system and are instead used by applications to store additional information about an inode. Extended attributes could be used, for instance, to specify the character encoding of a plain-text document.
+
+### HDFS extended attributes
+
+Extended attributes in HDFS are modeled after extended attributes in Linux (see the Linux manpage for [attr(5)](http://www.bestbits.at/acl/man/man5/attr.txt) and [related documentation](http://www.bestbits.at/acl/)). An extended attribute is a *name-value pair*, with a string name and binary value. Xattrs names must also be prefixed with a *namespace*. For example, an xattr named *myXattr* in the *user* namespace would be specified as **user.myXattr**. Multiple xattrs can be associated with a single inode.
+
+### Namespaces and Permissions
+
+In HDFS, there are five valid namespaces: `user`, `trusted`, `system`, `security`, and `raw`. Each of these namespaces have different access restrictions.
+
+The `user` namespace is the namespace that will commonly be used by client applications. Access to extended attributes in the user namespace is controlled by the corresponding file permissions.
+
+The `trusted` namespace is available only to HDFS superusers.
+
+The `system` namespace is reserved for internal HDFS use. This namespace is not accessible through userspace methods, and is reserved for implementing internal HDFS features.
+
+The `security` namespace is reserved for internal HDFS use. This namespace is generally not accessible through userspace methods. One particular use of `security` is the `security.hdfs.unreadable.by.superuser` extended attribute. This xattr can only be set on files, and it will prevent the superuser from reading the file's contents. The superuser can still read and modify file metadata, such as the owner, permissions, etc. This xattr can be set and accessed by any user, assuming normal filesystem permissions. This xattr is also write-once, and cannot be removed once set. This xattr does not allow a value to be set.
+
+The `raw` namespace is reserved for internal system attributes that sometimes need to be exposed. Like `system` namespace attributes they are not visible to the user except when `getXAttr`/`getXAttrs` is called on a file or directory in the `/.reserved/raw` HDFS directory hierarchy. These attributes can only be accessed by the superuser. An example of where `raw` namespace extended attributes are used is the `distcp` utility. Encryption zone meta data is stored in `raw.*` extended attributes, so as long as the administrator uses `/.reserved/raw` pathnames in source and target, the encrypted files in the encryption zones are transparently copied.
+
+Interacting with extended attributes
+------------------------------------
+
+The Hadoop shell has support for interacting with extended attributes via `hadoop fs -getfattr` and `hadoop fs -setfattr`. These commands are styled after the Linux [getfattr(1)](http://www.bestbits.at/acl/man/man1/getfattr.txt) and [setfattr(1)](http://www.bestbits.at/acl/man/man1/setfattr.txt) commands.
+
+### getfattr
+
+`hadoop fs -getfattr [-R] -n name | -d [-e en] <path`\>
+
+Displays the extended attribute names and values (if any) for a file or directory.
+
+| | |
+|:---- |:---- |
+| -R | Recursively list the attributes for all files and directories. |
+| -n name | Dump the named extended attribute value. |
+| -d | Dump all extended attribute values associated with pathname. |
+| -e \<encoding\> | Encode values after retrieving them. Valid encodings are "text", "hex", and "base64". Values encoded as text strings are enclosed in double quotes ("), and values encoded as hexadecimal and base64 are prefixed with 0x and 0s, respectively. |
+| \<path\> | The file or directory. |
+
+### setfattr
+
+`hadoop fs -setfattr -n name [-v value] | -x name <path`\>
+
+Sets an extended attribute name and value for a file or directory.
+
+| | |
+|:---- |:---- |
+| -n name | The extended attribute name. |
+| -v value | The extended attribute value. There are three different encoding methods for the value. If the argument is enclosed in double quotes, then the value is the string inside the quotes. If the argument is prefixed with 0x or 0X, then it is taken as a hexadecimal number. If the argument begins with 0s or 0S, then it is taken as a base64 encoding. |
+| -x name | Remove the extended attribute. |
+| \<path\> | The file or directory. |
+
+Configuration options
+---------------------
+
+HDFS supports extended attributes out of the box, without additional configuration. Administrators could potentially be interested in the options limiting the number of xattrs per inode and the size of xattrs, since xattrs increase the on-disk and in-memory space consumption of an inode.
+
+*   `dfs.namenode.xattrs.enabled`
+
+    Whether support for extended attributes is enabled on the NameNode. By default, extended attributes are enabled.
+
+*   `dfs.namenode.fs-limits.max-xattrs-per-inode`
+
+    The maximum number of extended attributes per inode. By default, this limit is 32.
+
+*   `dfs.namenode.fs-limits.max-xattr-size`
+
+    The maximum combined size of the name and value of an extended attribute in bytes. By default, this limit is 16384 bytes.
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/FaultInjectFramework.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/FaultInjectFramework.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/FaultInjectFramework.md
new file mode 100644
index 0000000..98bda50
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/FaultInjectFramework.md
@@ -0,0 +1,254 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Fault Injection Framework and Development Guide
+===============================================
+
+* [Fault Injection Framework and Development Guide](#Fault_Injection_Framework_and_Development_Guide)
+    * [Introduction](#Introduction)
+    * [Assumptions](#Assumptions)
+    * [Architecture of the Fault Injection Framework](#Architecture_of_the_Fault_Injection_Framework)
+        * [Configuration Management](#Configuration_Management)
+        * [Probability Model](#Probability_Model)
+        * [Fault Injection Mechanism: AOP and AspectJ](#Fault_Injection_Mechanism:_AOP_and_AspectJ)
+        * [Existing Join Points](#Existing_Join_Points)
+    * [Aspect Example](#Aspect_Example)
+    * [Fault Naming Convention and Namespaces](#Fault_Naming_Convention_and_Namespaces)
+    * [Development Tools](#Development_Tools)
+    * [Putting It All Together](#Putting_It_All_Together)
+        * [How to Use the Fault Injection Framework](#How_to_Use_the_Fault_Injection_Framework)
+    * [Additional Information and Contacts](#Additional_Information_and_Contacts)
+
+Introduction
+------------
+
+This guide provides an overview of the Hadoop Fault Injection (FI) framework for those who will be developing their own faults (aspects).
+
+The idea of fault injection is fairly simple: it is an infusion of errors and exceptions into an application's logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop's FI framework is built on top of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.
+
+Assumptions
+-----------
+
+The current implementation of the FI framework assumes that the faults it will be emulating are of non-deterministic nature. That is, the moment of a fault's happening isn't known in advance and is a coin-flip based.
+
+Architecture of the Fault Injection Framework
+---------------------------------------------
+
+Components layout
+
+### Configuration Management
+
+This piece of the FI framework allows you to set expectations for faults to happen. The settings can be applied either statically (in advance) or in runtime. The desired level of faults in the framework can be configured two ways:
+
+* editing src/aop/fi-site.xml configuration file. This file is
+  similar to other Hadoop's config files
+* setting system properties of JVM through VM startup parameters or
+  in build.properties file
+
+### Probability Model
+
+This is fundamentally a coin flipper. The methods of this class are getting a random number between 0.0 and 1.0 and then checking if a new number has happened in the range of 0.0 and a configured level for the fault in question. If that condition is true then the fault will occur.
+
+Thus, to guarantee the happening of a fault one needs to set an appropriate level to 1.0. To completely prevent a fault from happening its probability level has to be set to 0.0.
+
+Note: The default probability level is set to 0 (zero) unless the level is changed explicitly through the configuration file or in the runtime. The name of the default level's configuration parameter is fi.\*
+
+### Fault Injection Mechanism: AOP and AspectJ
+
+The foundation of Hadoop's FI framework includes a cross-cutting concept implemented by AspectJ. The following basic terms are important to remember:
+
+* A cross-cutting concept (aspect) is behavior, and often data, that
+  is used across the scope of a piece of software
+* In AOP, the aspects provide a mechanism by which a cross-cutting concern
+  can be specified in a modular way
+* Advice is the code that is executed when an aspect is invoked
+* Join point (or pointcut) is a specific point within the application
+  that may or not invoke some advice
+
+### Existing Join Points
+
+The following readily available join points are provided by AspectJ:
+
+* Join when a method is called
+* Join during a method's execution
+* Join when a constructor is invoked
+* Join during a constructor's execution
+* Join during aspect advice execution
+* Join before an object is initialized
+* Join during object initialization
+* Join during static initializer execution
+* Join when a class's field is referenced
+* Join when a class's field is assigned
+* Join when a handler is executed
+
+Aspect Example
+--------------
+
+```java
+package org.apache.hadoop.hdfs.server.datanode;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fi.ProbabilityModel;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.util.DiskChecker.*;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.DataOutputStream;
+
+/**
+ * This aspect takes care about faults injected into datanode.BlockReceiver
+ * class
+ */
+public aspect BlockReceiverAspects {
+  public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
+
+  public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
+    pointcut callReceivePacket() : call (* OutputStream.write(..))
+      && withincode (* BlockReceiver.receivePacket(..))
+    // to further limit the application of this aspect a very narrow 'target' can be used as follows
+    // && target(DataOutputStream)
+      && !within(BlockReceiverAspects +);
+
+  before () throws IOException : callReceivePacket () {
+    if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
+      LOG.info("Before the injection point");
+      Thread.dumpStack();
+      throw new DiskOutOfSpaceException ("FI: injected fault point at " +
+      thisJoinPoint.getStaticPart( ).getSourceLocation());
+    }
+  }
+}
+```
+
+The aspect has two main parts:
+
+* The join point pointcut callReceivepacket() which servers as an
+  identification mark of a specific point (in control and/or data
+  flow) in the life of an application.
+* A call to the advice - before () throws IOException :
+  callReceivepacket() - will be injected (see Putting It All
+  Together) before that specific spot of the application's code.
+
+The pointcut identifies an invocation of class' java.io.OutputStream write() method with any number of parameters and any return type. This invoke should take place within the body of method receivepacket() from classBlockReceiver. The method can have any parameters and any return type. Possible invocations of write() method happening anywhere within the aspect BlockReceiverAspects or its heirs will be ignored.
+
+Note 1: This short example doesn't illustrate the fact that you can have more than a single injection point per class. In such a case the names of the faults have to be different if a developer wants to trigger them separately.
+
+Note 2: After the injection step (see Putting It All Together) you can verify that the faults were properly injected by searching for ajc keywords in a disassembled class file.
+
+Fault Naming Convention and Namespaces
+--------------------------------------
+
+For the sake of a unified naming convention the following two types of names are recommended for a new aspects development:
+
+* Activity specific notation (when we don't care about a particular
+  location of a fault's happening). In this case the name of the
+  fault is rather abstract: fi.hdfs.DiskError
+* Location specific notation. Here, the fault's name is mnemonic as
+  in: fi.hdfs.datanode.BlockReceiver[optional location details]
+
+Development Tools
+-----------------
+
+* The Eclipse AspectJ Development Toolkit may help you when developing aspects
+* IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins
+
+Putting It All Together
+-----------------------
+
+Faults (aspects) have to injected (or woven) together before they can be used. Follow these instructions: \* To weave aspects in place use:
+
+        % ant injectfaults
+
+* If you misidentified the join point of your aspect you will see a warning (similar to the one shown here) when 'injectfaults' target is completed:
+
+            [iajc] warning at
+            src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
+                      BlockReceiverAspects.aj:44::0
+            advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
+            has not been applied [Xlint:adviceDidNotMatch]
+
+* It isn't an error, so the build will report the successful result. To prepare dev.jar file with all your faults weaved in place (HDFS-475 pending) use:
+
+            % ant jar-fault-inject
+
+* To create test jars use:
+
+            % ant jar-test-fault-inject
+
+* To run HDFS tests with faults injected use:
+
+            % ant run-test-hdfs-fault-inject
+
+### How to Use the Fault Injection Framework
+
+Faults can be triggered as follows:
+
+* During runtime:
+
+            % ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12
+
+    To set a certain level, for example 25%, of all injected faults use:
+
+            % ant run-test-hdfs-fault-inject -Dfi.*=0.25
+
+* From a program:
+
+```java
+package org.apache.hadoop.fs;
+
+import org.junit.Test;
+import org.junit.Before;
+
+public class DemoFiTest {
+  public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
+  @Override
+  @Before
+  public void setUp() {
+    //Setting up the test's environment as required
+  }
+
+  @Test
+  public void testFI() {
+    // It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
+    System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.12");
+    //
+    // The main logic of your tests goes here
+    //
+    // Now set the level back to 0 (zero) to prevent this fault from happening again
+    System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.0");
+    // or delete its trigger completely
+    System.getProperties().remove("fi." + BLOCK_RECEIVER_FAULT);
+  }
+
+  @Override
+  @After
+  public void tearDown() {
+    //Cleaning up test test environment
+  }
+}
+```
+
+As you can see above these two methods do the same thing. They are setting the probability level of `hdfs.datanode.BlockReceiver` at 12%. The difference, however, is that the program provides more flexibility and allows you to turn a fault off when a test no longer needs it.
+
+Additional Information and Contacts
+-----------------------------------
+
+These two sources of information are particularly interesting and worth reading:
+
+* <http://www.eclipse.org/aspectj/doc/next/devguide/>
+* AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)
+
+If you have additional comments or questions for the author check [HDFS-435](https://issues.apache.org/jira/browse/HDFS-435).

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/Federation.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/Federation.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/Federation.md
new file mode 100644
index 0000000..6996fac
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/Federation.md
@@ -0,0 +1,254 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+HDFS Federation
+===============
+
+* [HDFS Federation](#HDFS_Federation)
+    * [Background](#Background)
+    * [Multiple Namenodes/Namespaces](#Multiple_NamenodesNamespaces)
+        * [Key Benefits](#Key_Benefits)
+    * [Federation Configuration](#Federation_Configuration)
+        * [Configuration:](#Configuration:)
+        * [Formatting Namenodes](#Formatting_Namenodes)
+        * [Upgrading from an older release and configuring federation](#Upgrading_from_an_older_release_and_configuring_federation)
+        * [Adding a new Namenode to an existing HDFS cluster](#Adding_a_new_Namenode_to_an_existing_HDFS_cluster)
+    * [Managing the cluster](#Managing_the_cluster)
+        * [Starting and stopping cluster](#Starting_and_stopping_cluster)
+        * [Balancer](#Balancer)
+        * [Decommissioning](#Decommissioning)
+        * [Cluster Web Console](#Cluster_Web_Console)
+
+This guide provides an overview of the HDFS Federation feature and how to configure and manage the federated cluster.
+
+Background
+----------
+
+![HDFS Layers](./images/federation-background.gif)
+
+HDFS has two main layers:
+
+* **Namespace**
+    * Consists of directories, files and blocks.
+    * It supports all the namespace related file system operations such as
+      create, delete, modify and list files and directories.
+* **Block Storage Service**, which has two parts:
+    * Block Management (performed in the Namenode)
+        * Provides Datanode cluster membership by handling registrations, and periodic heart beats.
+        * Processes block reports and maintains location of blocks.
+        * Supports block related operations such as create, delete, modify and
+          get block location.
+        * Manages replica placement, block replication for under
+          replicated blocks, and deletes blocks that are over replicated.
+    * Storage - is provided by Datanodes by storing blocks on the local file
+      system and allowing read/write access.
+
+    The prior HDFS architecture allows only a single namespace for the entire cluster. In that configuration, a single Namenode manages the namespace. HDFS Federation addresses this limitation by adding support for multiple Namenodes/namespaces to HDFS.
+
+Multiple Namenodes/Namespaces
+-----------------------------
+
+In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated; the Namenodes are independent and do not require coordination with each other. The Datanodes are used as common storage for blocks by all the Namenodes. Each Datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports. They also handle commands from the Namenodes.
+
+Users may use [ViewFs](./ViewFs.html) to create personalized namespace views. ViewFs is analogous to client side mount tables in some Unix/Linux systems.
+
+![HDFS Federation Architecture](./images/federation.gif)
+
+**Block Pool**
+
+A Block Pool is a set of blocks that belong to a single namespace. Datanodes store blocks for all the block pools in the cluster. Each Block Pool is managed independently. This allows a namespace to generate Block IDs for new blocks without the need for coordination with the other namespaces. A Namenode failure does not prevent the Datanode from serving other Namenodes in the cluster.
+
+A Namespace and its block pool together are called Namespace Volume. It is a self-contained unit of management. When a Namenode/namespace is deleted, the corresponding block pool at the Datanodes is deleted. Each namespace volume is upgraded as a unit, during cluster upgrade.
+
+**ClusterID**
+
+A **ClusterID** identifier is used to identify all the nodes in the cluster. When a Namenode is formatted, this identifier is either provided or auto generated. This ID should be used for formatting the other Namenodes into the cluster.
+
+### Key Benefits
+
+* Namespace Scalability - Federation adds namespace horizontal
+  scaling. Large deployments or deployments using lot of small files
+  benefit from namespace scaling by allowing more Namenodes to be
+  added to the cluster.
+* Performance - File system throughput is not limited by a single
+  Namenode. Adding more Namenodes to the cluster scales the file
+  system read/write throughput.
+* Isolation - A single Namenode offers no isolation in a multi user
+  environment. For example, an experimental application can overload
+  the Namenode and slow down production critical applications. By using
+  multiple Namenodes, different categories of applications and users
+  can be isolated to different namespaces.
+
+Federation Configuration
+------------------------
+
+Federation configuration is **backward compatible** and allows existing single Namenode configurations to work without any change. The new configuration is designed such that all the nodes in the cluster have the same configuration without the need for deploying different configurations based on the type of the node in the cluster.
+
+Federation adds a new `NameServiceID` abstraction. A Namenode and its corresponding secondary/backup/checkpointer nodes all belong to a NameServiceId. In order to support a single configuration file, the Namenode and secondary/backup/checkpointer configuration parameters are suffixed with the `NameServiceID`.
+
+### Configuration:
+
+**Step 1**: Add the `dfs.nameservices` parameter to your configuration and configure it with a list of comma separated NameServiceIDs. This will be used by the Datanodes to determine the Namenodes in the cluster.
+
+**Step 2**: For each Namenode and Secondary Namenode/BackupNode/Checkpointer add the following configuration parameters suffixed with the corresponding `NameServiceID` into the common configuration file:
+
+| Daemon | Configuration Parameter |
+|:---- |:---- |
+| Namenode | `dfs.namenode.rpc-address` <br/> `dfs.namenode.servicerpc-address` <br/> `dfs.namenode.http-address` <br/> `dfs.namenode.https-address` <br/> `dfs.namenode.keytab.file` <br/> `dfs.namenode.name.dir` <br/> `dfs.namenode.edits.dir` <br/> `dfs.namenode.checkpoint.dir` <br/> `dfs.namenode.checkpoint.edits.dir` |
+| Secondary Namenode | `dfs.namenode.secondary.http-address` <br/> `dfs.secondary.namenode.keytab.file` |
+| BackupNode | `dfs.namenode.backup.address` <br/> `dfs.secondary.namenode.keytab.file` |
+
+Here is an example configuration with two Namenodes:
+
+```xml
+<configuration>
+  <property>
+    <name>dfs.nameservices</name>
+    <value>ns1,ns2</value>
+  </property>
+  <property>
+    <name>dfs.namenode.rpc-address.ns1</name>
+    <value>nn-host1:rpc-port</value>
+  </property>
+  <property>
+    <name>dfs.namenode.http-address.ns1</name>
+    <value>nn-host1:http-port</value>
+  </property>
+  <property>
+    <name>dfs.namenode.secondaryhttp-address.ns1</name>
+    <value>snn-host1:http-port</value>
+  </property>
+  <property>
+    <name>dfs.namenode.rpc-address.ns2</name>
+    <value>nn-host2:rpc-port</value>
+  </property>
+  <property>
+    <name>dfs.namenode.http-address.ns2</name>
+    <value>nn-host2:http-port</value>
+  </property>
+  <property>
+    <name>dfs.namenode.secondaryhttp-address.ns2</name>
+    <value>snn-host2:http-port</value>
+  </property>
+
+  .... Other common configuration ...
+</configuration>
+```
+
+### Formatting Namenodes
+
+**Step 1**: Format a Namenode using the following command:
+
+    [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
+
+Choose a unique cluster\_id which will not conflict other clusters in your environment. If a cluster\_id is not provided, then a unique one is auto generated.
+
+**Step 2**: Format additional Namenodes using the following command:
+
+    [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
+
+Note that the cluster\_id in step 2 must be same as that of the cluster\_id in step 1. If they are different, the additional Namenodes will not be part of the federated cluster.
+
+### Upgrading from an older release and configuring federation
+
+Older releases only support a single Namenode. Upgrade the cluster to newer release in order to enable federation During upgrade you can provide a ClusterID as follows:
+
+    [hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade -clusterId <cluster_ID>
+
+If cluster\_id is not provided, it is auto generated.
+
+### Adding a new Namenode to an existing HDFS cluster
+
+Perform the following steps:
+
+* Add `dfs.nameservices` to the configuration.
+
+* Update the configuration with the NameServiceID suffix. Configuration
+  key names changed post release 0.20. You must use the new configuration
+  parameter names in order to use federation.
+
+* Add the new Namenode related config to the configuration file.
+
+* Propagate the configuration file to the all the nodes in the cluster.
+
+* Start the new Namenode and Secondary/Backup.
+
+* Refresh the Datanodes to pickup the newly added Namenode by running
+  the following command against all the Datanodes in the cluster:
+
+        [hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
+
+Managing the cluster
+--------------------
+
+### Starting and stopping cluster
+
+To start the cluster run the following command:
+
+    [hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
+
+To stop the cluster run the following command:
+
+    [hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
+
+These commands can be run from any node where the HDFS configuration is available. The command uses the configuration to determine the Namenodes in the cluster and then starts the Namenode process on those nodes. The Datanodes are started on the nodes specified in the `slaves` file. The script can be used as a reference for building your own scripts to start and stop the cluster.
+
+### Balancer
+
+The Balancer has been changed to work with multiple Namenodes. The Balancer can be run using the command:
+
+    [hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script $HADOOP_PREFIX/bin/hdfs start balancer [-policy <policy>]
+
+The policy parameter can be any of the following:
+
+* `datanode` - this is the *default* policy. This balances the storage at
+  the Datanode level. This is similar to balancing policy from prior releases.
+
+* `blockpool` - this balances the storage at the block pool
+  level which also balances at the Datanode level.
+
+Note that Balancer only balances the data and does not balance the namespace.
+For the complete command usage, see [balancer](../hadoop-common/CommandsManual.html#balancer).
+
+### Decommissioning
+
+Decommissioning is similar to prior releases. The nodes that need to be decomissioned are added to the exclude file at all of the Namenodes. Each Namenode decommissions its Block Pool. When all the Namenodes finish decommissioning a Datanode, the Datanode is considered decommissioned.
+
+**Step 1**: To distribute an exclude file to all the Namenodes, use the following command:
+
+    [hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>
+
+**Step 2**: Refresh all the Namenodes to pick up the new exclude file:
+
+    [hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
+
+The above command uses HDFS configuration to determine the configured Namenodes in the cluster and refreshes them to pick up the new exclude file.
+
+### Cluster Web Console
+
+Similar to the Namenode status web page, when using federation a Cluster Web Console is available to monitor the federated cluster at `http://<any_nn_host:port>/dfsclusterhealth.jsp`. Any Namenode in the cluster can be used to access this web page.
+
+The Cluster Web Console provides the following information:
+
+* A cluster summary that shows the number of files, number of blocks,
+  total configured storage capacity, and the available and used storage
+  for the entire cluster.
+
+* A list of Namenodes and a summary that includes the number of files,
+  blocks, missing blocks, and live and dead data nodes for each
+  Namenode. It also provides a link to access each Namenode's web UI.
+
+* The decommissioning status of Datanodes.
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
new file mode 100644
index 0000000..5fdfb0c
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
@@ -0,0 +1,514 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+HDFS Commands Guide
+===================
+
+* [Overview](#Overview)
+* [User Commands](#User_Commands)
+    * [classpath](#classpath)
+    * [dfs](#dfs)
+    * [fetchdt](#fetchdt)
+    * [fsck](#fsck)
+    * [getconf](#getconf)
+    * [groups](#groups)
+    * [lsSnapshottableDir](#lsSnapshottableDir)
+    * [jmxget](#jmxget)
+    * [oev](#oev)
+    * [oiv](#oiv)
+    * [oiv\_legacy](#oiv_legacy)
+    * [snapshotDiff](#snapshotDiff)
+    * [version](#version)
+* [Administration Commands](#Administration_Commands)
+    * [balancer](#balancer)
+    * [cacheadmin](#cacheadmin)
+    * [crypto](#crypto)
+    * [datanode](#datanode)
+    * [dfsadmin](#dfsadmin)
+    * [haadmin](#haadmin)
+    * [journalnode](#journalnode)
+    * [mover](#mover)
+    * [namenode](#namenode)
+    * [nfs3](#nfs3)
+    * [portmap](#portmap)
+    * [secondarynamenode](#secondarynamenode)
+    * [storagepolicies](#storagepolicies)
+    * [zkfc](#zkfc)
+* [Debug Commands](#Debug_Commands)
+    * [verify](#verify)
+    * [recoverLease](#recoverLease)
+
+Overview
+--------
+
+All HDFS commands are invoked by the `bin/hdfs` script. Running the hdfs script without any arguments prints the description for all commands.
+
+Usage: `hdfs [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]`
+
+Hadoop has an option parsing framework that employs parsing generic options as well as running classes.
+
+| COMMAND\_OPTIONS | Description |
+|:---- |:---- |
+| `--config`<br/>`--loglevel` | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Overview) page. |
+| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
+| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
+
+User Commands
+-------------
+
+Commands useful for users of a hadoop cluster.
+
+### `classpath`
+
+Usage: `hdfs classpath`
+
+Prints the class path needed to get the Hadoop jar and the required libraries
+
+### `dfs`
+
+Usage: `hdfs dfs [COMMAND [COMMAND_OPTIONS]]`
+
+Run a filesystem command on the file system supported in Hadoop. The various COMMAND\_OPTIONS can be found at [File System Shell Guide](../hadoop-common/FileSystemShell.html).
+
+### `fetchdt`
+
+Usage: `hdfs fetchdt [--webservice <namenode_http_addr>] <path> `
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `--webservice` *https\_address* | use http protocol instead of RPC |
+| *fileName* | File name to store the token into. |
+
+Gets Delegation Token from a NameNode. See [fetchdt](./HdfsUserGuide.html#fetchdt) for more info.
+
+### `fsck`
+
+Usage:
+
+       hdfs fsck <path>
+              [-list-corruptfileblocks |
+              [-move | -delete | -openforwrite]
+              [-files [-blocks [-locations | -racks]]]
+              [-includeSnapshots] [-showprogress]
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| *path* | Start checking from this path. |
+| `-delete` | Delete corrupted files. |
+| `-files` | Print out files being checked. |
+| `-files` `-blocks` | Print out the block report |
+| `-files` `-blocks` `-locations` | Print out locations for every block. |
+| `-files` `-blocks` `-racks` | Print out network topology for data-node locations. |
+| `-includeSnapshots` | Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it. |
+| `-list-corruptfileblocks` | Print out list of missing blocks and files they belong to. |
+| `-move` | Move corrupted files to /lost+found. |
+| `-openforwrite` | Print out files opened for write. |
+| `-showprogress` | Print out dots for progress in output. Default is OFF (no progress). |
+
+Runs the HDFS filesystem checking utility. See [fsck](./HdfsUserGuide.html#fsck) for more info.
+
+### `getconf`
+
+Usage:
+
+       hdfs getconf -namenodes
+       hdfs getconf -secondaryNameNodes
+       hdfs getconf -backupNodes
+       hdfs getconf -includeFile
+       hdfs getconf -excludeFile
+       hdfs getconf -nnRpcAddresses
+       hdfs getconf -confKey [key]
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-namenodes` | gets list of namenodes in the cluster. |
+| `-secondaryNameNodes` | gets list of secondary namenodes in the cluster. |
+| `-backupNodes` | gets list of backup nodes in the cluster. |
+| `-includeFile` | gets the include file path that defines the datanodes that can join the cluster. |
+| `-excludeFile` | gets the exclude file path that defines the datanodes that need to decommissioned. |
+| `-nnRpcAddresses` | gets the namenode rpc addresses |
+| `-confKey` [key] | gets a specific key from the configuration |
+
+Gets configuration information from the configuration directory, post-processing.
+
+### `groups`
+
+Usage: `hdfs groups [username ...]`
+
+Returns the group information given one or more usernames.
+
+### `lsSnapshottableDir`
+
+Usage: `hdfs lsSnapshottableDir [-help]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-help` | print help |
+
+Get the list of snapshottable directories. When this is run as a super user, it returns all snapshottable directories. Otherwise it returns those directories that are owned by the current user.
+
+### `jmxget`
+
+Usage: `hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-help` | print help |
+| `-localVM` ConnectorURL | connect to the VM on the same machine |
+| `-port` *mbean server port* | specify mbean server port, if missing it will try to connect to MBean Server in the same VM |
+| `-service` | specify jmx service, either DataNode or NameNode, the default |
+
+Dump JMX information from a service.
+
+### `oev`
+
+Usage: `hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE`
+
+#### Required command line arguments:
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-i`,`--inputFile` *arg* | edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format |
+| `-o`,`--outputFile` *arg* | Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option |
+
+#### Optional command line arguments:
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-f`,`--fix-txids` | Renumber the transaction IDs in the input, so that there are no gaps or invalid transaction IDs. |
+| `-h`,`--help` | Display usage information and exit |
+| `-r`,`--ecover` | When reading binary edit logs, use recovery mode. This will give you the chance to skip corrupt parts of the edit log. |
+| `-p`,`--processor` *arg* | Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), xml (default, XML format), stats (prints statistics about edits file) |
+| `-v`,`--verbose` | More verbose output, prints the input and output filenames, for processors that write to a file, also output to screen. On large image files this will dramatically increase processing time (default is false). |
+
+Hadoop offline edits viewer.
+
+### `oiv`
+
+Usage: `hdfs oiv [OPTIONS] -i INPUT_FILE`
+
+#### Required command line arguments:
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-i`,`--inputFile` *arg* | edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format |
+
+#### Optional command line arguments:
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-h`,`--help` | Display usage information and exit |
+| `-o`,`--outputFile` *arg* | Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option |
+| `-p`,`--processor` *arg* | Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), xml (default, XML format), stats (prints statistics about edits file) |
+
+Hadoop Offline Image Viewer for newer image files.
+
+### `oiv_legacy`
+
+Usage: `hdfs oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-h`,`--help` | Display usage information and exit |
+| `-i`,`--inputFile` *arg* | edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format |
+| `-o`,`--outputFile` *arg* | Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option |
+
+Hadoop offline image viewer for older versions of Hadoop.
+
+### `snapshotDiff`
+
+Usage: `hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot> `
+
+Determine the difference between HDFS snapshots. See the [HDFS Snapshot Documentation](./HdfsSnapshots.html#Get_Snapshots_Difference_Report) for more information.
+
+### `version`
+
+Usage: `hdfs version`
+
+Prints the version.
+
+Administration Commands
+-----------------------
+
+Commands useful for administrators of a hadoop cluster.
+
+### `balancer`
+
+Usage:
+
+        hdfs balancer
+              [-threshold <threshold>]
+              [-policy <policy>]
+              [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
+              [-include [-f <hosts-file> | <comma-separated list of hosts>]]
+              [-idleiterations <idleiterations>]
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-policy` \<policy\> | `datanode` (default): Cluster is balanced if each datanode is balanced.<br/> `blockpool`: Cluster is balanced if each block pool in each datanode is balanced. |
+| `-threshold` \<threshold\> | Percentage of disk capacity. This overwrites the default threshold. |
+| `-exclude -f` \<hosts-file\> \| \<comma-separated list of hosts\> | Excludes the specified datanodes from being balanced by the balancer. |
+| `-include -f` \<hosts-file\> \| \<comma-separated list of hosts\> | Includes only the specified datanodes to be balanced by the balancer. |
+| `-idleiterations` \<iterations\> | Maximum number of idle iterations before exit. This overwrites the default idleiterations(5). |
+
+Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the rebalancing process. See [Balancer](./HdfsUserGuide.html#Balancer) for more details.
+
+Note that the `blockpool` policy is more strict than the `datanode` policy.
+
+### `cacheadmin`
+
+Usage: `hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]`
+
+See the [HDFS Cache Administration Documentation](./CentralizedCacheManagement.html#cacheadmin_command-line_interface) for more information.
+
+### `crypto`
+
+Usage:
+
+      hdfs crypto -createZone -keyName <keyName> -path <path>
+      hdfs crypto -help <command-name>
+      hdfs crypto -listZones
+
+See the [HDFS Transparent Encryption Documentation](./TransparentEncryption.html#crypto_command-line_interface) for more information.
+
+### `datanode`
+
+Usage: `hdfs datanode [-regular | -rollback | -rollingupgrace rollback]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-regular` | Normal datanode startup (default). |
+| `-rollback` | Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version. |
+| `-rollingupgrade` rollback | Rollback a rolling upgrade operation. |
+
+Runs a HDFS datanode.
+
+### `dfsadmin`
+
+Usage:
+
+        hdfs dfsadmin [GENERIC_OPTIONS]
+              [-report [-live] [-dead] [-decommissioning]]
+              [-safemode enter | leave | get | wait]
+              [-saveNamespace]
+              [-rollEdits]
+              [-restoreFailedStorage true |false |check]
+              [-refreshNodes]
+              [-setQuota <quota> <dirname>...<dirname>]
+              [-clrQuota <dirname>...<dirname>]
+              [-setSpaceQuota <quota> <dirname>...<dirname>]
+              [-clrSpaceQuota <dirname>...<dirname>]
+              [-setStoragePolicy <path> <policyName>]
+              [-getStoragePolicy <path>]
+              [-finalizeUpgrade]
+              [-rollingUpgrade [<query> |<prepare> |<finalize>]]
+              [-metasave filename]
+              [-refreshServiceAcl]
+              [-refreshUserToGroupsMappings]
+              [-refreshSuperUserGroupsConfiguration]
+              [-refreshCallQueue]
+              [-refresh <host:ipc_port> <key> [arg1..argn]]
+              [-reconfig <datanode |...> <host:ipc_port> <start |status>]
+              [-printTopology]
+              [-refreshNamenodes datanodehost:port]
+              [-deleteBlockPool datanode-host:port blockpoolId [force]]
+              [-setBalancerBandwidth <bandwidth in bytes per second>]
+              [-allowSnapshot <snapshotDir>]
+              [-disallowSnapshot <snapshotDir>]
+              [-fetchImage <local directory>]
+              [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
+              [-getDatanodeInfo <datanode_host:ipc_port>]
+              [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
+              [-help [cmd]]
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-report` `[-live]` `[-dead]` `[-decommissioning]` | Reports basic filesystem information and statistics. Optional flags may be used to filter the list of displayed DataNodes. |
+| `-safemode` enter\|leave\|get\|wait | Safe mode maintenance command. Safe mode is a Namenode state in which it <br/>1. does not accept changes to the name space (read-only) <br/>2. does not replicate or delete blocks. <br/>Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition. Safe mode can also be entered manually, but then it can only be turned off manually as well. |
+| `-saveNamespace` | Save current namespace into storage directories and reset edits log. Requires safe mode. |
+| `-rollEdits` | Rolls the edit log on the active NameNode. |
+| `-restoreFailedStorage` true\|false\|check | This option will turn on/off automatic attempt to restore failed storage replicas. If a failed storage becomes available again the system will attempt to restore edits and/or fsimage during checkpoint. 'check' option will return current setting. |
+| `-refreshNodes` | Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Namenode and those that should be decommissioned or recommissioned. |
+| `-setQuota` \<quota\> \<dirname\>...\<dirname\> | See [HDFS Quotas Guide](../hadoop-hdfs/HdfsQuotaAdminGuide.html#Administrative_Commands) for the detail. |
+| `-clrQuota` \<dirname\>...\<dirname\> | See [HDFS Quotas Guide](../hadoop-hdfs/HdfsQuotaAdminGuide.html#Administrative_Commands) for the detail. |
+| `-setSpaceQuota` \<quota\> \<dirname\>...\<dirname\> | See [HDFS Quotas Guide](../hadoop-hdfs/HdfsQuotaAdminGuide.html#Administrative_Commands) for the detail. |
+| `-clrSpaceQuota` \<dirname\>...\<dirname\> | See [HDFS Quotas Guide](../hadoop-hdfs/HdfsQuotaAdminGuide.html#Administrative_Commands) for the detail. |
+| `-setStoragePolicy` \<path\> \<policyName\> | Set a storage policy to a file or a directory. |
+| `-getStoragePolicy` \<path\> | Get the storage policy of a file or a directory. |
+| `-finalizeUpgrade` | Finalize upgrade of HDFS. Datanodes delete their previous version working directories, followed by Namenode doing the same. This completes the upgrade process. |
+| `-rollingUpgrade` [\<query\>\|\<prepare\>\|\<finalize\>] | See [Rolling Upgrade document](../hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade) for the detail. |
+| `-metasave` filename | Save Namenode's primary data structures to *filename* in the directory specified by hadoop.log.dir property. *filename* is overwritten if it exists. *filename* will contain one line for each of the following<br/>1. Datanodes heart beating with Namenode<br/>2. Blocks waiting to be replicated<br/>3. Blocks currently being replicated<br/>4. Blocks waiting to be deleted |
+| `-refreshServiceAcl` | Reload the service-level authorization policy file. |
+| `-refreshUserToGroupsMappings` | Refresh user-to-groups mappings. |
+| `-refreshSuperUserGroupsConfiguration` | Refresh superuser proxy groups mappings |
+| `-refreshCallQueue` | Reload the call queue from config. |
+| `-refresh` \<host:ipc\_port\> \<key\> [arg1..argn] | Triggers a runtime-refresh of the resource specified by \<key\> on \<host:ipc\_port\>. All other args after are sent to the host. |
+| `-reconfig` \<datanode \|...\> \<host:ipc\_port\> \<start\|status\> | Start reconfiguration or get the status of an ongoing reconfiguration. The second parameter specifies the node type. Currently, only reloading DataNode's configuration is supported. |
+| `-printTopology` | Print a tree of the racks and their nodes as reported by the Namenode |
+| `-refreshNamenodes` datanodehost:port | For the given datanode, reloads the configuration files, stops serving the removed block-pools and starts serving new block-pools. |
+| `-deleteBlockPool` datanode-host:port blockpoolId [force] | If force is passed, block pool directory for the given blockpool id on the given datanode is deleted along with its contents, otherwise the directory is deleted only if it is empty. The command will fail if datanode is still serving the block pool. Refer to refreshNamenodes to shutdown a block pool service on a datanode. |
+| `-setBalancerBandwidth` \<bandwidth in bytes per second\> | Changes the network bandwidth used by each datanode during HDFS block balancing. \<bandwidth\> is the maximum number of bytes per second that will be used by each datanode. This value overrides the dfs.balance.bandwidthPerSec parameter. NOTE: The new value is not persistent on the DataNode. |
+| `-allowSnapshot` \<snapshotDir\> | Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable. See the [HDFS Snapshot Documentation](./HdfsSnapshots.html) for more information. |
+| `-disallowSnapshot` \<snapshotDir\> | Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots. See the [HDFS Snapshot Documentation](./HdfsSnapshots.html) for more information. |
+| `-fetchImage` \<local directory\> | Downloads the most recent fsimage from the NameNode and saves it in the specified local directory. |
+| `-shutdownDatanode` \<datanode\_host:ipc\_port\> [upgrade] | Submit a shutdown request for the given datanode. See [Rolling Upgrade document](./HdfsRollingUpgrade.html#dfsadmin_-shutdownDatanode) for the detail. |
+| `-getDatanodeInfo` \<datanode\_host:ipc\_port\> | Get the information about the given datanode. See [Rolling Upgrade document](./HdfsRollingUpgrade.html#dfsadmin_-getDatanodeInfo) for the detail. |
+| `-triggerBlockReport` `[-incremental]` \<datanode\_host:ipc\_port\> | Trigger a block report for the given datanode. If 'incremental' is specified, it will be otherwise, it will be a full block report. |
+| `-help` [cmd] | Displays help for the given command or all commands if none is specified. |
+
+Runs a HDFS dfsadmin client.
+
+### `haadmin`
+
+Usage:
+
+        hdfs haadmin -checkHealth <serviceId>
+        hdfs haadmin -failover [--forcefence] [--forceactive] <serviceId> <serviceId>
+        hdfs haadmin -getServiceState <serviceId>
+        hdfs haadmin -help <command>
+        hdfs haadmin -transitionToActive <serviceId> [--forceactive]
+        hdfs haadmin -transitionToStandby <serviceId>
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-checkHealth` | check the health of the given NameNode |
+| `-failover` | initiate a failover between two NameNodes |
+| `-getServiceState` | determine whether the given NameNode is Active or Standby |
+| `-transitionToActive` | transition the state of the given NameNode to Active (Warning: No fencing is done) |
+| `-transitionToStandby` | transition the state of the given NameNode to Standby (Warning: No fencing is done) |
+
+See [HDFS HA with NFS](./HDFSHighAvailabilityWithNFS.html#Administrative_commands) or [HDFS HA with QJM](./HDFSHighAvailabilityWithQJM.html#Administrative_commands) for more information on this command.
+
+### `journalnode`
+
+Usage: `hdfs journalnode`
+
+This comamnd starts a journalnode for use with [HDFS HA with QJM](./HDFSHighAvailabilityWithQJM.html#Administrative_commands).
+
+### `mover`
+
+Usage: `hdfs mover [-p <files/dirs> | -f <local file name>]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-f` \<local file\> | Specify a local file containing a list of HDFS files/dirs to migrate. |
+| `-p` \<files/dirs\> | Specify a space separated list of HDFS files/dirs to migrate. |
+
+Runs the data migration utility. See [Mover](./ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool) for more details.
+
+Note that, when both -p and -f options are omitted, the default path is the root directory.
+
+### `namenode`
+
+Usage:
+
+      hdfs namenode [-backup] |
+              [-checkpoint] |
+              [-format [-clusterid cid ] [-force] [-nonInteractive] ] |
+              [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
+              [-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |
+              [-rollback] |
+              [-rollingUpgrade <downgrade |rollback> ] |
+              [-finalize] |
+              [-importCheckpoint] |
+              [-initializeSharedEdits] |
+              [-bootstrapStandby] |
+              [-recover [-force] ] |
+              [-metadataVersion ]
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-backup` | Start backup node. |
+| `-checkpoint` | Start checkpoint node. |
+| `-format` `[-clusterid cid]` `[-force]` `[-nonInteractive]` | Formats the specified NameNode. It starts the NameNode, formats it and then shut it down. -force option formats if the name directory exists. -nonInteractive option aborts if the name directory exists, unless -force option is specified. |
+| `-upgrade` `[-clusterid cid]` [`-renameReserved` \<k-v pairs\>] | Namenode should be started with upgrade option after the distribution of new Hadoop version. |
+| `-upgradeOnly` `[-clusterid cid]` [`-renameReserved` \<k-v pairs\>] | Upgrade the specified NameNode and then shutdown it. |
+| `-rollback` | Rollback the NameNode to the previous version. This should be used after stopping the cluster and distributing the old Hadoop version. |
+| `-rollingUpgrade` \<downgrade\|rollback\|started\> | See [Rolling Upgrade document](./HdfsRollingUpgrade.html#NameNode_Startup_Options) for the detail. |
+| `-finalize` | Finalize will remove the previous state of the files system. Recent upgrade will become permanent. Rollback option will not be available anymore. After finalization it shuts the NameNode down. |
+| `-importCheckpoint` | Loads image from a checkpoint directory and save it into the current one. Checkpoint dir is read from property fs.checkpoint.dir |
+| `-initializeSharedEdits` | Format a new shared edits dir and copy in enough edit log segments so that the standby NameNode can start up. |
+| `-bootstrapStandby` | Allows the standby NameNode's storage directories to be bootstrapped by copying the latest namespace snapshot from the active NameNode. This is used when first configuring an HA cluster. |
+| `-recover` `[-force]` | Recover lost metadata on a corrupt filesystem. See [HDFS User Guide](./HdfsUserGuide.html#Recovery_Mode) for the detail. |
+| `-metadataVersion` | Verify that configured directories exist, then print the metadata versions of the software and the image. |
+
+Runs the namenode. More info about the upgrade, rollback and finalize is at [Upgrade Rollback](./HdfsUserGuide.html#Upgrade_and_Rollback).
+
+### `nfs3`
+
+Usage: `hdfs nfs3`
+
+This comamnd starts the NFS3 gateway for use with the [HDFS NFS3 Service](./HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service).
+
+### `portmap`
+
+Usage: `hdfs portmap`
+
+This comamnd starts the RPC portmap for use with the [HDFS NFS3 Service](./HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service).
+
+### `secondarynamenode`
+
+Usage: `hdfs secondarynamenode [-checkpoint [force]] | [-format] | [-geteditsize]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-checkpoint` [force] | Checkpoints the SecondaryNameNode if EditLog size \>= fs.checkpoint.size. If `force` is used, checkpoint irrespective of EditLog size. |
+| `-format` | Format the local storage during startup. |
+| `-geteditsize` | Prints the number of uncheckpointed transactions on the NameNode. |
+
+Runs the HDFS secondary namenode. See [Secondary Namenode](./HdfsUserGuide.html#Secondary_NameNode) for more info.
+
+### `storagepolicies`
+
+Usage: `hdfs storagepolicies`
+
+Lists out all storage policies. See the [HDFS Storage Policy Documentation](./ArchivalStorage.html) for more information.
+
+### `zkfc`
+
+Usage: `hdfs zkfc [-formatZK [-force] [-nonInteractive]]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-formatZK` | Format the Zookeeper instance |
+| `-h` | Display help |
+
+This comamnd starts a Zookeeper Failover Controller process for use with [HDFS HA with QJM](./HDFSHighAvailabilityWithQJM.html#Administrative_commands).
+
+Debug Commands
+--------------
+
+Useful commands to help administrators debug HDFS issues, like validating block files and calling recoverLease.
+
+### `verify`
+
+Usage: `hdfs dfs verify [-meta <metadata-file>] [-block <block-file>]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| `-block` *block-file* | Optional parameter to specify the absolute path for the block file on the local file system of the data node. |
+| `-meta` *metadata-file* | Absolute path for the metadata file on the local file system of the data node. |
+
+Verify HDFS metadata and block files. If a block file is specified, we will verify that the checksums in the metadata file match the block file.
+
+### `recoverLease`
+
+Usage: `hdfs dfs recoverLease [-path <path>] [-retries <num-retries>]`
+
+| COMMAND\_OPTION | Description |
+|:---- |:---- |
+| [`-path` *path*] | HDFS path for which to recover the lease. |
+| [`-retries` *num-retries*] | Number of times the client will retry calling recoverLease. The default number of retries is 1. |
+
+Recover the lease on the specified path. The path must reside on an HDFS filesystem. The default number of retries is 1.


Mime
View raw message