hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a.@apache.org
Subject [11/11] hadoop git commit: HDFS-7668. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)
Date Fri, 13 Feb 2015 02:20:28 GMT
HDFS-7668. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/2f1e5dc6
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/2f1e5dc6
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/2f1e5dc6

Branch: refs/heads/trunk
Commit: 2f1e5dc6288972004b5bed335c4a8d038aaedcf4
Parents: 93b941c
Author: Allen Wittenauer <aw@apache.org>
Authored: Thu Feb 12 18:19:45 2015 -0800
Committer: Allen Wittenauer <aw@apache.org>
Committed: Thu Feb 12 18:19:45 2015 -0800

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt     |    3 +
 .../src/site/apt/ArchivalStorage.apt.vm         |  233 --
 .../site/apt/CentralizedCacheManagement.apt.vm  |  344 ---
 .../src/site/apt/ExtendedAttributes.apt.vm      |   97 -
 .../src/site/apt/FaultInjectFramework.apt.vm    |  312 ---
 .../hadoop-hdfs/src/site/apt/Federation.apt.vm  |  339 ---
 .../src/site/apt/HDFSCommands.apt.vm            |  797 ------
 .../site/apt/HDFSHighAvailabilityWithNFS.apt.vm |  859 ------
 .../site/apt/HDFSHighAvailabilityWithQJM.apt.vm |  816 ------
 .../hadoop-hdfs/src/site/apt/HdfsDesign.apt.vm  |  510 ----
 .../src/site/apt/HdfsEditsViewer.apt.vm         |  104 -
 .../src/site/apt/HdfsImageViewer.apt.vm         |  247 --
 .../src/site/apt/HdfsMultihoming.apt.vm         |  145 -
 .../src/site/apt/HdfsNfsGateway.apt.vm          |  364 ---
 .../src/site/apt/HdfsPermissionsGuide.apt.vm    |  438 ---
 .../src/site/apt/HdfsQuotaAdminGuide.apt.vm     |  116 -
 .../src/site/apt/HdfsUserGuide.apt.vm           |  556 ----
 .../hadoop-hdfs/src/site/apt/LibHdfs.apt.vm     |  101 -
 .../src/site/apt/SLGUserGuide.apt.vm            |  195 --
 .../src/site/apt/ShortCircuitLocalReads.apt.vm  |  112 -
 .../src/site/apt/TransparentEncryption.apt.vm   |  290 --
 .../hadoop-hdfs/src/site/apt/ViewFs.apt.vm      |  304 --
 .../hadoop-hdfs/src/site/apt/WebHDFS.apt.vm     | 2628 ------------------
 .../src/site/markdown/ArchivalStorage.md        |  160 ++
 .../site/markdown/CentralizedCacheManagement.md |  268 ++
 .../src/site/markdown/ExtendedAttributes.md     |   98 +
 .../src/site/markdown/FaultInjectFramework.md   |  254 ++
 .../hadoop-hdfs/src/site/markdown/Federation.md |  254 ++
 .../src/site/markdown/HDFSCommands.md           |  505 ++++
 .../markdown/HDFSHighAvailabilityWithNFS.md     |  678 +++++
 .../markdown/HDFSHighAvailabilityWithQJM.md     |  642 +++++
 .../hadoop-hdfs/src/site/markdown/HdfsDesign.md |  240 ++
 .../src/site/markdown/HdfsEditsViewer.md        |   69 +
 .../src/site/markdown/HdfsImageViewer.md        |  172 ++
 .../src/site/markdown/HdfsMultihoming.md        |  127 +
 .../src/site/markdown/HdfsNfsGateway.md         |  254 ++
 .../src/site/markdown/HdfsPermissionsGuide.md   |  284 ++
 .../src/site/markdown/HdfsQuotaAdminGuide.md    |   92 +
 .../src/site/markdown/HdfsUserGuide.md          |  375 +++
 .../hadoop-hdfs/src/site/markdown/LibHdfs.md    |   92 +
 .../src/site/markdown/SLGUserGuide.md           |  157 ++
 .../src/site/markdown/ShortCircuitLocalReads.md |   87 +
 .../src/site/markdown/TransparentEncryption.md  |  268 ++
 .../hadoop-hdfs/src/site/markdown/ViewFs.md     |  242 ++
 .../hadoop-hdfs/src/site/markdown/WebHDFS.md    | 1939 +++++++++++++
 45 files changed, 7260 insertions(+), 9907 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/2f1e5dc6/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 9117fc8..bf4c9de 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -141,6 +141,9 @@ Trunk (Unreleased)
 
     HDFS-7322. deprecate sbin/hadoop-daemon.sh (aw)
 
+    HDFS-7668. Convert site documentation from apt to markdown (Masatake
+    Iwasaki via aw)
+
   OPTIMIZATIONS
 
   BUG FIXES

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2f1e5dc6/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm
deleted file mode 100644
index 5336ea3..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm
+++ /dev/null
@@ -1,233 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Archival Storage, SSD & Memory
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Archival Storage, SSD & Memory
-
-%{toc|section=1|fromDepth=0}
-
-* {Introduction}
-
-  <Archival Storage> is a solution to decouple growing storage capacity from compute capacity.
-  Nodes with higher density and less expensive storage with low compute power are becoming available
-  and can be used as cold storage in the clusters.
-  Based on policy the data from hot can be moved to the cold.
-  Adding more nodes to the cold storage can grow the storage independent of the compute capacity
-  in the cluster.
-
-  The frameworks provided by Heterogeneous Storage and Archival Storage generalizes the HDFS architecture
-  to include other kinds of storage media including <SSD> and <memory>.
-  Users may choose to store their data in SSD or memory for a better performance.
-
-* {Storage Types and Storage Policies}
-
-** {Storage Types: ARCHIVE, DISK, SSD and RAM_DISK}
-
-  The first phase of
-  {{{https://issues.apache.org/jira/browse/HDFS-2832}Heterogeneous Storage (HDFS-2832)}}
-  changed datanode storage model from a single storage,
-  which may correspond to multiple physical storage medias,
-  to a collection of storages with each storage corresponding to a physical storage media.
-  It also added the notion of storage types, DISK and SSD,
-  where DISK is the default storage type.
-
-  A new storage type <ARCHIVE>,
-  which has high storage density (petabyte of storage) but little compute power,
-  is added for supporting archival storage.
-
-  Another new storage type <RAM_DISK> is added for supporting writing single replica files in memory.
-
-** {Storage Policies: Hot, Warm, Cold, All_SSD, One_SSD and Lazy_Persist}
-
-  A new concept of storage policies is introduced in order to allow files to be stored
-  in different storage types according to the storage policy.
-
-  We have the following storage policies:
-
-  * <<Hot>> - for both storage and compute.
-              The data that is popular and still being used for processing will stay in this policy.
-              When a block is hot, all replicas are stored in DISK.
-
-  * <<Cold>> - only for storage with limited compute.
-               The data that is no longer being used, or data that needs to be archived is moved
-               from hot storage to cold storage.
-               When a block is cold, all replicas are stored in ARCHIVE.
-
-  * <<Warm>> - partially hot and partially cold.
-               When a block is warm, some of its replicas are stored in DISK
-               and the remaining replicas are stored in ARCHIVE.
-
-  * <<All_SSD>> - for storing all replicas in SSD.
-
-  * <<One_SSD>> - for storing one of the replicas in SSD.
-                  The remaining replicas are stored in DISK.
-
-  * <<Lazy_Persist>> - for writing blocks with single replica in memory.
-                       The replica is first written in RAM_DISK and then it is lazily persisted in DISK.
-
-  []
-
-  More formally, a storage policy consists of the following fields:
-
-  [[1]] Policy ID
-
-  [[2]] Policy name
-
-  [[3]] A list of storage types for block placement
-
-  [[4]] A list of fallback storage types for file creation
-
-  [[5]] A list of fallback storage types for replication
-
-  []
-
-  When there is enough space,
-  block replicas are stored according to the storage type list specified in #3.
-  When some of the storage types in list #3 are running out of space,
-  the fallback storage type lists specified in #4 and #5 are used
-  to replace the out-of-space storage types for file creation and replication, respectively.
-
-  The following is a typical storage policy table.
-
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| <<Policy>> | <<Policy>>| <<Block Placement>>      | <<Fallback storages>> | <<Fallback storages>> |
-| <<ID>>     | <<Name>>  | <<(n\ replicas)>>        | <<for creation>>      | <<for replication>>   |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| 15     | Lasy_Persist  | RAM_DISK: 1, DISK: <n>-1 | DISK                  | DISK                  |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| 12     | All_SSD       | SSD: <n>                 | DISK                  | DISK                  |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| 10     | One_SSD       | SSD: 1, DISK: <n>-1      | SSD, DISK             | SSD, DISK             |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| 7      | Hot (default) | DISK: <n>                | \<none\>              | ARCHIVE               |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| 5      | Warm          | DISK: 1, ARCHIVE: <n>-1  | ARCHIVE, DISK         | ARCHIVE, DISK         |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-| 2      | Cold          | ARCHIVE: <n>             | \<none\>              | \<none\>              |
-*--------+---------------+--------------------------+-----------------------+-----------------------+
-
-  Note that the Lasy_Persist policy is useful only for single replica blocks.
-  For blocks with more than one replicas, all the replicas will be written to DISK
-  since writing only one of the replicas to RAM_DISK does not improve the overall performance.
-
-** {Storage Policy Resolution}
-
-  When a file or directory is created, its storage policy is <unspecified>.
-  The storage policy can be specified using
-  the "<<<{{{Set Storage Policy}dfsadmin -setStoragePolicy}}>>>" command.
-  The effective storage policy of a file or directory is resolved by the following rules.
-
-  [[1]] If the file or directory is specificed with a storage policy, return it.
-
-  [[2]] For an unspecified file or directory,
-        if it is the root directory, return the <default storage policy>.
-        Otherwise, return its parent's effective storage policy.
-
-  []
-
-  The effective storage policy can be retrieved by
-  the "<<<{{{Set Storage Policy}dfsadmin -getStoragePolicy}}>>>" command.
-
-
-** {Configuration}
-
-  * <<dfs.storage.policy.enabled>>
-    - for enabling/disabling the storage policy feature.
-    The default value is <<<true>>>.
-
-  []
-
-
-* {Mover - A New Data Migration Tool}
-
-  A new data migration tool is added for archiving data.
-  The tool is similar to Balancer.
-  It periodically scans the files in HDFS to check if the block placement satisfies the storage policy.
-  For the blocks violating the storage policy,
-  it moves the replicas to a different storage type
-  in order to fulfill the storage policy requirement.
-
-  * Command:
-
-+------------------------------------------+
-hdfs mover [-p <files/dirs> | -f <local file name>]
-+------------------------------------------+
-
-  * Arguments:
-
-*-------------------------+--------------------------------------------------------+
-| <<<-p \<files/dirs\>>>> | Specify a space separated list of HDFS files/dirs to migrate.
-*-------------------------+--------------------------------------------------------+
-| <<<-f \<local file\>>>> | Specify a local file containing a list of HDFS files/dirs to migrate.
-*-------------------------+--------------------------------------------------------+
-
-  Note that, when both -p and -f options are omitted, the default path is the root directory.
-
-  []
-
-
-* {Storage Policy Commands}
-
-** {List Storage Policies}
-
-  List out all the storage policies.
-
-  * Command:
-
-+------------------------------------------+
-hdfs storagepolicies -listPolicies
-+------------------------------------------+
-
-  * Arguments: none.
-
-** {Set Storage Policy}
-
-  Set a storage policy to a file or a directory.
-
-  * Command:
-
-+------------------------------------------+
-hdfs storagepolicies -setStoragePolicy -path <path> -policy <policy>
-+------------------------------------------+
-
-  * Arguments:
-
-*--------------------------+-----------------------------------------------------+
-| <<<-path \<path\>>>>     | The path referring to either a directory or a file. |
-*--------------------------+-----------------------------------------------------+
-| <<<-policy \<policy\>>>> | The name of the storage policy.                     |
-*--------------------------+-----------------------------------------------------+
-
-  []
-
-** {Get Storage Policy}
-
-  Get the storage policy of a file or a directory.
-
-  * Command:
-
-+------------------------------------------+
-hdfs storagepolicies -getStoragePolicy -path <path>
-+------------------------------------------+
-
-  * Arguments:
-
-*----------------------------+-----------------------------------------------------+
-| <<<-path \<path\>>>>       | The path referring to either a directory or a file. |
-*----------------------------+-----------------------------------------------------+
-
-  []

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2f1e5dc6/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm
deleted file mode 100644
index 8f5647b..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm
+++ /dev/null
@@ -1,344 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Distributed File System-${project.version} - Centralized Cache Management in HDFS
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Centralized Cache Management in HDFS
-
-%{toc|section=1|fromDepth=2|toDepth=4}
-
-* {Overview}
-
-  <Centralized cache management> in HDFS is an explicit caching mechanism that
-  allows users to specify <paths> to be cached by HDFS. The NameNode will
-  communicate with DataNodes that have the desired blocks on disk, and instruct
-  them to cache the blocks in off-heap caches. 
-
-  Centralized cache management in HDFS has many significant advantages.
-
-  [[1]] Explicit pinning prevents frequently used data from being evicted from
-  memory. This is particularly important when the size of the working set
-  exceeds the size of main memory, which is common for many HDFS workloads.
-
-  [[1]] Because DataNode caches are managed by the NameNode, applications can
-  query the set of cached block locations when making task placement decisions.
-  Co-locating a task with a cached block replica improves read performance.
-
-  [[1]] When block has been cached by a DataNode, clients can use a new ,
-  more-efficient, zero-copy read API. Since checksum verification of cached
-  data is done once by the DataNode, clients can incur essentially zero
-  overhead when using this new API.
-
-  [[1]] Centralized caching can improve overall cluster memory utilization.
-  When relying on the OS buffer cache at each DataNode, repeated reads of
-  a block will result in all <n> replicas of the block being pulled into
-  buffer cache. With centralized cache management, a user can explicitly pin
-  only <m> of the <n> replicas, saving <n-m> memory.
-
-* {Use Cases}
-
-  Centralized cache management is useful for files that accessed repeatedly.
-  For example, a small <fact table> in Hive which is often used for joins is a
-  good candidate for caching. On the other hand, caching the input of a <
-  one year reporting query> is probably less useful, since the
-  historical data might only be read once.
-
-  Centralized cache management is also useful for mixed workloads with
-  performance SLAs. Caching the working set of a high-priority workload
-  insures that it does not contend for disk I/O with a low-priority workload.
-
-* {Architecture}
-
-[images/caching.png] Caching Architecture
-
-  In this architecture, the NameNode is responsible for coordinating all the
-  DataNode off-heap caches in the cluster. The NameNode periodically receives
-  a <cache report> from each DataNode which describes all the blocks cached
-  on a given DN. The NameNode manages DataNode caches by piggybacking cache and
-  uncache commands on the DataNode heartbeat.
-
-  The NameNode queries its set of <cache directives> to determine
-  which paths should be cached. Cache directives are persistently stored in the
-  fsimage and edit log, and can be added, removed, and modified via Java and
-  command-line APIs. The NameNode also stores a set of <cache pools>,
-  which are administrative entities used to group cache directives together for
-  resource management and enforcing permissions.
-
-  The NameNode periodically rescans the namespace and active cache directives
-  to determine which blocks need to be cached or uncached and assign caching
-  work to DataNodes. Rescans can also be triggered by user actions like adding
-  or removing a cache directive or removing a cache pool.
-
-  We do not currently cache blocks which are under construction, corrupt, or
-  otherwise incomplete.  If a cache directive covers a symlink, the symlink
-  target is not cached.
-
-  Caching is currently done on the file or directory-level. Block and sub-block
-  caching is an item of future work.
-
-* {Concepts}
-
-** {Cache directive}
-
-  A <cache directive> defines a path that should be cached. Paths can be either
-  directories or files. Directories are cached non-recursively, meaning only
-  files in the first-level listing of the directory.
-
-  Directives also specify additional parameters, such as the cache replication
-  factor and expiration time. The replication factor specifies the number of
-  block replicas to cache. If multiple cache directives refer to the same file,
-  the maximum cache replication factor is applied.
-
-  The expiration time is specified on the command line as a <time-to-live
-  (TTL)>, a relative expiration time in the future. After a cache directive
-  expires, it is no longer considered by the NameNode when making caching
-  decisions.
-
-** {Cache pool}
-
-  A <cache pool> is an administrative entity used to manage groups of cache
-  directives. Cache pools have UNIX-like <permissions>, which restrict which
-  users and groups have access to the pool. Write permissions allow users to
-  add and remove cache directives to the pool. Read permissions allow users to
-  list the cache directives in a pool, as well as additional metadata. Execute
-  permissions are unused.
-
-  Cache pools are also used for resource management. Pools can enforce a
-  maximum <limit>, which restricts the number of bytes that can be cached in
-  aggregate by directives in the pool. Normally, the sum of the pool limits
-  will approximately equal the amount of aggregate memory reserved for
-  HDFS caching on the cluster. Cache pools also track a number of statistics
-  to help cluster users determine what is and should be cached.
-
-  Pools also can enforce a maximum time-to-live. This restricts the maximum
-  expiration time of directives being added to the pool.
-
-* {<<<cacheadmin>>> command-line interface}
-
-  On the command-line, administrators and users can interact with cache pools
-  and directives via the <<<hdfs cacheadmin>>> subcommand.
-
-  Cache directives are identified by a unique, non-repeating 64-bit integer ID.
-  IDs will not be reused even if a cache directive is later removed.
-
-  Cache pools are identified by a unique string name.
-
-** {Cache directive commands}
-
-*** {addDirective}
-
-  Usage: <<<hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]>>>
-
-  Add a new cache directive.
-
-*--+--+
-\<path\> | A path to cache. The path can be a directory or a file.
-*--+--+
-\<pool-name\> | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives.
-*--+--+
--force | Skips checking of cache pool resource limits.
-*--+--+
-\<replication\> | The cache replication factor to use. Defaults to 1.
-*--+--+
-\<time-to-live\> | How long the directive is valid. Can be specified in minutes, hours, and days, e.g. 30m, 4h, 2d. Valid units are [smhd]. "never" indicates a directive that never expires. If unspecified, the directive never expires.
-*--+--+
-
-*** {removeDirective}
-
-  Usage: <<<hdfs cacheadmin -removeDirective <id> >>>
-
-  Remove a cache directive.
-
-*--+--+
-\<id\> | The id of the cache directive to remove.  You must have write permission on the pool of the directive in order to remove it.  To see a list of cachedirective IDs, use the -listDirectives command.
-*--+--+
-
-*** {removeDirectives}
-
-  Usage: <<<hdfs cacheadmin -removeDirectives <path> >>>
-
-  Remove every cache directive with the specified path.
-
-*--+--+
-\<path\> | The path of the cache directives to remove.  You must have write permission on the pool of the directive in order to remove it.  To see a list of cache directives, use the -listDirectives command.
-*--+--+
-
-*** {listDirectives}
-
-  Usage: <<<hdfs cacheadmin -listDirectives [-stats] [-path <path>] [-pool <pool>]>>>
-
-  List cache directives.
-
-*--+--+
-\<path\> | List only cache directives with this path. Note that if there is a cache directive for <path> in a cache pool that we don't have read access for, it will not be listed.
-*--+--+
-\<pool\> | List only path cache directives in that pool.
-*--+--+
--stats | List path-based cache directive statistics.
-*--+--+
-
-** {Cache pool commands}
-
-*** {addPool}
-
-  Usage: <<<hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>>>>
-
-  Add a new cache pool.
-
-*--+--+
-\<name\> | Name of the new pool.
-*--+--+
-\<owner\> | Username of the owner of the pool. Defaults to the current user.
-*--+--+
-\<group\> | Group of the pool. Defaults to the primary group name of the current user.
-*--+--+
-\<mode\> | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755.
-*--+--+
-\<limit\> | The maximum number of bytes that can be cached by directives in this pool, in aggregate. By default, no limit is set.
-*--+--+
-\<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool. This can be specified in seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. Valid units are [smhd]. By default, no maximum is set. A value of \"never\" specifies that there is no limit.
-*--+--+
-
-*** {modifyPool}
-
-  Usage: <<<hdfs cacheadmin -modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]>>>
-
-  Modifies the metadata of an existing cache pool.
-
-*--+--+
-\<name\> | Name of the pool to modify.
-*--+--+
-\<owner\> | Username of the owner of the pool.
-*--+--+
-\<group\> | Groupname of the group of the pool.
-*--+--+
-\<mode\> | Unix-style permissions of the pool in octal.
-*--+--+
-\<limit\> | Maximum number of bytes that can be cached by this pool.
-*--+--+
-\<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool.
-*--+--+
-
-*** {removePool}
-
-  Usage: <<<hdfs cacheadmin -removePool <name> >>>
-
-  Remove a cache pool. This also uncaches paths associated with the pool.
-
-*--+--+
-\<name\> | Name of the cache pool to remove.
-*--+--+
-
-*** {listPools}
-
-  Usage: <<<hdfs cacheadmin -listPools [-stats] [<name>]>>>
-
-  Display information about one or more cache pools, e.g. name, owner, group,
-  permissions, etc.
-
-*--+--+
--stats | Display additional cache pool statistics.
-*--+--+
-\<name\> | If specified, list only the named cache pool.
-*--+--+
-
-*** {help}
-
-  Usage: <<<hdfs cacheadmin -help <command-name> >>>
-
-  Get detailed help about a command.
-
-*--+--+
-\<command-name\> | The command for which to get detailed help. If no command is specified, print detailed help for all commands.
-*--+--+
-
-* {Configuration}
-
-** {Native Libraries}
-
-  In order to lock block files into memory, the DataNode relies on native JNI
-  code found in <<<libhadoop.so>>> or <<<hadoop.dll>>> on Windows. Be sure to
-  {{{../hadoop-common/NativeLibraries.html}enable JNI}} if you are using HDFS
-  centralized cache management.
-
-** {Configuration Properties}
-
-*** Required
-
-  Be sure to configure the following:
-
-  * dfs.datanode.max.locked.memory
-
-    This determines the maximum amount of memory a DataNode will use for caching.
-    On Unix-like systems, the "locked-in-memory size" ulimit (<<<ulimit -l>>>) of
-    the DataNode user also needs to be increased to match this parameter (see
-    below section on {{OS Limits}}). When setting this value, please remember
-    that you will need space in memory for other things as well, such as the
-    DataNode and application JVM heaps and the operating system page cache.
-
-*** Optional
-
-  The following properties are not required, but may be specified for tuning:
-
-  * dfs.namenode.path.based.cache.refresh.interval.ms
-
-    The NameNode will use this as the amount of milliseconds between subsequent
-    path cache rescans.  This calculates the blocks to cache and each DataNode
-    containing a replica of the block that should cache it.
-
-    By default, this parameter is set to 300000, which is five minutes.
-
-  * dfs.datanode.fsdatasetcache.max.threads.per.volume
-
-    The DataNode will use this as the maximum number of threads per volume to
-    use for caching new data.
-
-    By default, this parameter is set to 4.
-
-  * dfs.cachereport.intervalMsec
-
-    The DataNode will use this as the amount of milliseconds between sending a
-    full report of its cache state to the NameNode.
-
-    By default, this parameter is set to 10000, which is 10 seconds.
-
-  * dfs.namenode.path.based.cache.block.map.allocation.percent
-
-    The percentage of the Java heap which we will allocate to the cached blocks
-    map.  The cached blocks map is a hash map which uses chained hashing.
-    Smaller maps may be accessed more slowly if the number of cached blocks is
-    large; larger maps will consume more memory.  The default is 0.25 percent.
-
-** {OS Limits}
-
-  If you get the error "Cannot start datanode because the configured max
-  locked memory size... is more than the datanode's available RLIMIT_MEMLOCK
-  ulimit," that means that the operating system is imposing a lower limit
-  on the amount of memory that you can lock than what you have configured. To
-  fix this, you must adjust the ulimit -l value that the DataNode runs with.
-  Usually, this value is configured in <<</etc/security/limits.conf>>>.
-  However, it will vary depending on what operating system and distribution
-  you are using.
-
-  You will know that you have correctly configured this value when you can run
-  <<<ulimit -l>>> from the shell and get back either a higher value than what
-  you have configured with <<<dfs.datanode.max.locked.memory>>>, or the string
-  "unlimited," indicating that there is no limit.  Note that it's typical for
-  <<<ulimit -l>>> to output the memory lock limit in KB, but
-  dfs.datanode.max.locked.memory must be specified in bytes.
-
-  This information does not apply to deployments on Windows.  Windows has no
-  direct equivalent of <<<ulimit -l>>>.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2f1e5dc6/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ExtendedAttributes.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ExtendedAttributes.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ExtendedAttributes.apt.vm
deleted file mode 100644
index 109e988..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ExtendedAttributes.apt.vm
+++ /dev/null
@@ -1,97 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Distributed File System-${project.version} - Extended Attributes
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Extended Attributes in HDFS
-
-%{toc|section=1|fromDepth=2|toDepth=4}
-
-* {Overview}
-
-  <Extended attributes> (abbreviated as <xattrs>) are a filesystem feature that allow user applications to associate additional metadata with a file or directory. Unlike system-level inode metadata such as file permissions or modification time, extended attributes are not interpreted by the system and are instead used by applications to store additional information about an inode. Extended attributes could be used, for instance, to specify the character encoding of a plain-text document.
-
-** {HDFS extended attributes}
-
-  Extended attributes in HDFS are modeled after extended attributes in Linux (see the Linux manpage for {{{http://www.bestbits.at/acl/man/man5/attr.txt}attr(5)}} and {{{http://www.bestbits.at/acl/}related documentation}}). An extended attribute is a <name-value pair>, with a string name and binary value. Xattrs names must also be prefixed with a <namespace>. For example, an xattr named <myXattr> in the <user> namespace would be specified as <<user.myXattr>>. Multiple xattrs can be associated with a single inode.
-
-** {Namespaces and Permissions}
-
-  In HDFS, there are five valid namespaces: <<<user>>>, <<<trusted>>>, <<<system>>>, <<<security>>>, and <<<raw>>>. Each of these namespaces have different access restrictions.
-
-  The <<<user>>> namespace is the namespace that will commonly be used by client applications. Access to extended attributes in the user namespace is controlled by the corresponding file permissions.
-
-  The <<<trusted>>> namespace is available only to HDFS superusers.
-
-  The <<<system>>> namespace is reserved for internal HDFS use. This namespace is not accessible through userspace methods, and is reserved for implementing internal HDFS features.
-
-  The <<<security>>> namespace is reserved for internal HDFS use. This namespace is generally not accessible through userspace methods. One particular use of <<<security>>> is the <<<security.hdfs.unreadable.by.superuser>>> extended attribute. This xattr can only be set on files, and it will prevent the superuser from reading the file's contents. The superuser can still read and modify file metadata, such as the owner, permissions, etc. This xattr can be set and accessed by any user, assuming normal filesystem permissions. This xattr is also write-once, and cannot be removed once set. This xattr does not allow a value to be set.
-
- The <<<raw>>> namespace is reserved for internal system attributes that sometimes need to be exposed. Like <<<system>>> namespace attributes they are not visible to the user except when <<<getXAttr>>>/<<<getXAttrs>>> is called on a file or directory in the <<</.reserved/raw>>> HDFS directory hierarchy. These attributes can only be accessed by the superuser. An example of where <<<raw>>> namespace extended attributes are used is the <<<distcp>>> utility. Encryption zone meta data is stored in <<<raw.*>>> extended attributes, so as long as the administrator uses <<</.reserved/raw>>> pathnames in source and target, the encrypted files in the encryption zones are transparently copied.
-
-* {Interacting with extended attributes}
-
-  The Hadoop shell has support for interacting with extended attributes via <<<hadoop fs -getfattr>>> and <<<hadoop fs -setfattr>>>. These commands are styled after the Linux {{{http://www.bestbits.at/acl/man/man1/getfattr.txt}getfattr(1)}} and {{{http://www.bestbits.at/acl/man/man1/setfattr.txt}setfattr(1)}} commands.
-
-** {getfattr}
-
-  <<<hadoop fs -getfattr [-R] {-n name | -d} [-e en] <path>>>>
-
-  Displays the extended attribute names and values (if any) for a file or directory.
-
-*--+--+
--R | Recursively list the attributes for all files and directories.
-*--+--+
--n name | Dump the named extended attribute value.
-*--+--+
--d | Dump all extended attribute values associated with pathname.
-*--+--+
--e \<encoding\> | Encode values after retrieving them. Valid encodings are "text", "hex", and "base64". Values encoded as text strings are enclosed in double quotes ("), and values encoded as hexadecimal and base64 are prefixed with 0x and 0s, respectively.
-*--+--+
-\<path\> | The file or directory.
-*--+--+
-
-** {setfattr}
-
-  <<<hadoop fs -setfattr {-n name [-v value] | -x name} <path>>>>
-
-  Sets an extended attribute name and value for a file or directory.
-
-*--+--+
--n name | The extended attribute name.
-*--+--+
--v value | The extended attribute value. There are three different encoding methods for the value. If the argument is enclosed in double quotes, then the value is the string inside the quotes. If the argument is prefixed with 0x or 0X, then it is taken as a hexadecimal number. If the argument begins with 0s or 0S, then it is taken as a base64 encoding.
-*--+--+
--x name | Remove the extended attribute.
-*--+--+
-\<path\> | The file or directory.
-*--+--+
-
-* {Configuration options}
-
-  HDFS supports extended attributes out of the box, without additional configuration. Administrators could potentially be interested in the options limiting the number of xattrs per inode and the size of xattrs, since xattrs increase the on-disk and in-memory space consumption of an inode.
-
-  * <<<dfs.namenode.xattrs.enabled>>>
-
-  Whether support for extended attributes is enabled on the NameNode. By default, extended attributes are enabled.
-
-  * <<<dfs.namenode.fs-limits.max-xattrs-per-inode>>>
-
-  The maximum number of extended attributes per inode. By default, this limit is 32.
-
-  * <<<dfs.namenode.fs-limits.max-xattr-size>>>
-
-  The maximum combined size of the name and value of an extended attribute in bytes. By default, this limit is 16384 bytes.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2f1e5dc6/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/FaultInjectFramework.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/FaultInjectFramework.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/FaultInjectFramework.apt.vm
deleted file mode 100644
index 5cf3e57..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/FaultInjectFramework.apt.vm
+++ /dev/null
@@ -1,312 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Fault Injection Framework and Development Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Fault Injection Framework and Development Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Introduction
-
-   This guide provides an overview of the Hadoop Fault Injection (FI)
-   framework for those who will be developing their own faults (aspects).
-
-   The idea of fault injection is fairly simple: it is an infusion of
-   errors and exceptions into an application's logic to achieve a higher
-   coverage and fault tolerance of the system. Different implementations
-   of this idea are available today. Hadoop's FI framework is built on top
-   of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.
-
-* Assumptions
-
-   The current implementation of the FI framework assumes that the faults
-   it will be emulating are of non-deterministic nature. That is, the
-   moment of a fault's happening isn't known in advance and is a coin-flip
-   based.
-
-* Architecture of the Fault Injection Framework
-
-   Components layout
-
-** Configuration Management
-
-   This piece of the FI framework allows you to set expectations for
-   faults to happen. The settings can be applied either statically (in
-   advance) or in runtime. The desired level of faults in the framework
-   can be configured two ways:
-
-     * editing src/aop/fi-site.xml configuration file. This file is
-       similar to other Hadoop's config files
-
-     * setting system properties of JVM through VM startup parameters or
-       in build.properties file
-
-** Probability Model
-
-   This is fundamentally a coin flipper. The methods of this class are
-   getting a random number between 0.0 and 1.0 and then checking if a new
-   number has happened in the range of 0.0 and a configured level for the
-   fault in question. If that condition is true then the fault will occur.
-
-   Thus, to guarantee the happening of a fault one needs to set an
-   appropriate level to 1.0. To completely prevent a fault from happening
-   its probability level has to be set to 0.0.
-
-   Note: The default probability level is set to 0 (zero) unless the level
-   is changed explicitly through the configuration file or in the runtime.
-   The name of the default level's configuration parameter is fi.*
-
-** Fault Injection Mechanism: AOP and AspectJ
-
-   The foundation of Hadoop's FI framework includes a cross-cutting
-   concept implemented by AspectJ. The following basic terms are important
-   to remember:
-
-     * A cross-cutting concept (aspect) is behavior, and often data, that
-       is used across the scope of a piece of software
-
-     * In AOP, the aspects provide a mechanism by which a cross-cutting
-       concern can be specified in a modular way
-
-     * Advice is the code that is executed when an aspect is invoked
-
-     * Join point (or pointcut) is a specific point within the application
-       that may or not invoke some advice
-
-** Existing Join Points
-
-   The following readily available join points are provided by AspectJ:
-
-     * Join when a method is called
-
-     * Join during a method's execution
-
-     * Join when a constructor is invoked
-
-     * Join during a constructor's execution
-
-     * Join during aspect advice execution
-
-     * Join before an object is initialized
-
-     * Join during object initialization
-
-     * Join during static initializer execution
-
-     * Join when a class's field is referenced
-
-     * Join when a class's field is assigned
-
-     * Join when a handler is executed
-
-* Aspect Example
-
-----
-    package org.apache.hadoop.hdfs.server.datanode;
-
-    import org.apache.commons.logging.Log;
-    import org.apache.commons.logging.LogFactory;
-    import org.apache.hadoop.fi.ProbabilityModel;
-    import org.apache.hadoop.hdfs.server.datanode.DataNode;
-    import org.apache.hadoop.util.DiskChecker.*;
-
-    import java.io.IOException;
-    import java.io.OutputStream;
-    import java.io.DataOutputStream;
-
-    /**
-     * This aspect takes care about faults injected into datanode.BlockReceiver
-     * class
-     */
-    public aspect BlockReceiverAspects {
-      public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
-
-      public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
-        pointcut callReceivePacket() : call (* OutputStream.write(..))
-          && withincode (* BlockReceiver.receivePacket(..))
-        // to further limit the application of this aspect a very narrow 'target' can be used as follows
-        // && target(DataOutputStream)
-          && !within(BlockReceiverAspects +);
-
-      before () throws IOException : callReceivePacket () {
-        if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
-          LOG.info("Before the injection point");
-          Thread.dumpStack();
-          throw new DiskOutOfSpaceException ("FI: injected fault point at " +
-          thisJoinPoint.getStaticPart( ).getSourceLocation());
-        }
-      }
-    }
-----
-
-   The aspect has two main parts:
-
-     * The join point pointcut callReceivepacket() which servers as an
-       identification mark of a specific point (in control and/or data
-       flow) in the life of an application.
-
-     * A call to the advice - before () throws IOException :
-       callReceivepacket() - will be injected (see Putting It All
-       Together) before that specific spot of the application's code.
-
-   The pointcut identifies an invocation of class' java.io.OutputStream
-   write() method with any number of parameters and any return type. This
-   invoke should take place within the body of method receivepacket() from
-   classBlockReceiver. The method can have any parameters and any return
-   type. Possible invocations of write() method happening anywhere within
-   the aspect BlockReceiverAspects or its heirs will be ignored.
-
-   Note 1: This short example doesn't illustrate the fact that you can
-   have more than a single injection point per class. In such a case the
-   names of the faults have to be different if a developer wants to
-   trigger them separately.
-
-   Note 2: After the injection step (see Putting It All Together) you can
-   verify that the faults were properly injected by searching for ajc
-   keywords in a disassembled class file.
-
-* Fault Naming Convention and Namespaces
-
-   For the sake of a unified naming convention the following two types of
-   names are recommended for a new aspects development:
-
-     * Activity specific notation (when we don't care about a particular
-       location of a fault's happening). In this case the name of the
-       fault is rather abstract: fi.hdfs.DiskError
-
-     * Location specific notation. Here, the fault's name is mnemonic as
-       in: fi.hdfs.datanode.BlockReceiver[optional location details]
-
-* Development Tools
-
-     * The Eclipse AspectJ Development Toolkit may help you when
-       developing aspects
-
-     * IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins
-
-* Putting It All Together
-
-   Faults (aspects) have to injected (or woven) together before they can
-   be used. Follow these instructions:
-     * To weave aspects in place use:
-
-----
-    % ant injectfaults
-----
-
-     * If you misidentified the join point of your aspect you will see a
-       warning (similar to the one shown here) when 'injectfaults' target
-       is completed:
-
-----
-    [iajc] warning at
-    src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
-              BlockReceiverAspects.aj:44::0
-    advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
-    has not been applied [Xlint:adviceDidNotMatch]
-----
-
-     * It isn't an error, so the build will report the successful result.
-       To prepare dev.jar file with all your faults weaved in place
-       (HDFS-475 pending) use:
-
-----
-    % ant jar-fault-inject
-----
-
-     * To create test jars use:
-
-----
-    % ant jar-test-fault-inject
-----
-
-     * To run HDFS tests with faults injected use:
-
-----
-    % ant run-test-hdfs-fault-inject
-----
-
-** How to Use the Fault Injection Framework
-
-   Faults can be triggered as follows:
-
-     * During runtime:
-
-----
-    % ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12
-----
-
-       To set a certain level, for example 25%, of all injected faults
-       use:
-
-----
-    % ant run-test-hdfs-fault-inject -Dfi.*=0.25
-----
-
-     * From a program:
-
-----
-    package org.apache.hadoop.fs;
-
-    import org.junit.Test;
-    import org.junit.Before;
-
-    public class DemoFiTest {
-      public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
-      @Override
-      @Before
-      public void setUp() {
-        //Setting up the test's environment as required
-      }
-
-      @Test
-      public void testFI() {
-        // It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
-        System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.12");
-        //
-        // The main logic of your tests goes here
-        //
-        // Now set the level back to 0 (zero) to prevent this fault from happening again
-        System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.0");
-        // or delete its trigger completely
-        System.getProperties().remove("fi." + BLOCK_RECEIVER_FAULT);
-      }
-
-      @Override
-      @After
-      public void tearDown() {
-        //Cleaning up test test environment
-      }
-    }
-----
-
-   As you can see above these two methods do the same thing. They are
-   setting the probability level of <<<hdfs.datanode.BlockReceiver>>> at 12%.
-   The difference, however, is that the program provides more flexibility
-   and allows you to turn a fault off when a test no longer needs it.
-
-* Additional Information and Contacts
-
-   These two sources of information are particularly interesting and worth
-   reading:
-
-     * {{http://www.eclipse.org/aspectj/doc/next/devguide/}}
-
-     * AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)
-
-   If you have additional comments or questions for the author check
-   {{{https://issues.apache.org/jira/browse/HDFS-435}HDFS-435}}.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2f1e5dc6/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
deleted file mode 100644
index 17aaf3c..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
+++ /dev/null
@@ -1,339 +0,0 @@
-
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Distributed File System-${project.version} - Federation
-  ---
-  ---
-  ${maven.build.timestamp}
-
-HDFS Federation
-
-%{toc|section=1|fromDepth=0}
-
-  This guide provides an overview of the HDFS Federation feature and
-  how to configure and manage the federated cluster.
-
-* {Background}
-
-[./images/federation-background.gif] HDFS Layers
-
-  HDFS has two main layers:
-
-  * <<Namespace>>
-
-    * Consists of directories, files and blocks.
-
-    * It supports all the namespace related file system operations such as
-      create, delete, modify and list files and directories.
-
-  * <<Block Storage Service>>, which has two parts:
-
-    * Block Management (performed in the Namenode)
-
-      * Provides Datanode cluster membership by handling registrations, and
-        periodic heart beats.
-
-      * Processes block reports and maintains location of blocks.
-
-      * Supports block related operations such as create, delete, modify and
-        get block location.
-
-      * Manages replica placement, block replication for under
-        replicated blocks, and deletes blocks that are over replicated.
-
-    * Storage - is provided by Datanodes by storing blocks on the local file
-      system and allowing read/write access.
-
-  The prior HDFS architecture allows only a single namespace for the
-  entire cluster. In that configuration, a single Namenode manages the
-  namespace. HDFS Federation addresses this limitation by adding
-  support for multiple Namenodes/namespaces to HDFS.
-
-* {Multiple Namenodes/Namespaces}
-
-  In order to scale the name service horizontally, federation uses multiple
-  independent Namenodes/namespaces. The Namenodes are federated; the
-  Namenodes are independent and do not require coordination with each other.
-  The Datanodes are used as common storage for blocks by all the Namenodes.
-  Each Datanode registers with all the Namenodes in the cluster. Datanodes
-  send periodic heartbeats and block reports. They also handle
-  commands from the Namenodes.
-
-  Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views.
-  ViewFs is analogous to client side mount tables in some Unix/Linux systems.
-
-[./images/federation.gif] HDFS Federation Architecture
-
-
-  <<Block Pool>>
-
-  A Block Pool is a set of blocks that belong to a single namespace.
-  Datanodes store blocks for all the block pools in the cluster.  Each
-  Block Pool is managed independently. This allows a namespace to
-  generate Block IDs for new blocks without the need for coordination
-  with the other namespaces. A Namenode failure does not prevent the
-  Datanode from serving other Namenodes in the cluster.
-
-  A Namespace and its block pool together are called Namespace Volume.
-  It is a self-contained unit of management. When a Namenode/namespace
-  is deleted, the corresponding block pool at the Datanodes is deleted.
-  Each namespace volume is upgraded as a unit, during cluster upgrade.
-
-  <<ClusterID>>
-
-  A <<ClusterID>> identifier is used to identify all the nodes in the
-  cluster.  When a Namenode is formatted, this identifier is either
-  provided or auto generated. This ID should be used for formatting
-  the other Namenodes into the cluster.
-
-** Key Benefits
-
-  * Namespace Scalability - Federation adds namespace horizontal
-    scaling. Large deployments or deployments using lot of small files
-    benefit from namespace scaling by allowing more Namenodes to be
-    added to the cluster.
-
-  * Performance - File system throughput is not limited by a single
-    Namenode. Adding more Namenodes to the cluster scales the file
-    system read/write throughput.
-
-  * Isolation - A single Namenode offers no isolation in a multi user
-    environment. For example, an experimental application can overload
-    the Namenode and slow down production critical applications. By using
-    multiple Namenodes, different categories of applications and users
-    can be isolated to different namespaces.
-
-* {Federation Configuration}
-
-  Federation configuration is <<backward compatible>> and allows
-  existing single Namenode configurations to work without any
-  change. The new configuration is designed such that all the nodes in
-  the cluster have the same configuration without the need for
-  deploying different configurations based on the type of the node in
-  the cluster.
-
-  Federation adds a new <<<NameServiceID>>> abstraction. A Namenode
-  and its corresponding secondary/backup/checkpointer nodes all belong
-  to a NameServiceId. In order to support a single configuration file,
-  the Namenode and secondary/backup/checkpointer configuration
-  parameters are suffixed with the <<<NameServiceID>>>.
-
-
-** Configuration:
-
-  <<Step 1>>: Add the <<<dfs.nameservices>>> parameter to your
-  configuration and configure it with a list of comma separated
-  NameServiceIDs. This will be used by the Datanodes to determine the
-  Namenodes in the cluster.
-
-  <<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
-  add the following configuration parameters suffixed with the corresponding
-  <<<NameServiceID>>> into the common configuration file:
-
-*---------------------+--------------------------------------------+
-|| Daemon             || Configuration Parameter                   |
-*---------------------+--------------------------------------------+
-| Namenode            | <<<dfs.namenode.rpc-address>>>             |
-|                     | <<<dfs.namenode.servicerpc-address>>>      |
-|                     | <<<dfs.namenode.http-address>>>            |
-|                     | <<<dfs.namenode.https-address>>>           |
-|                     | <<<dfs.namenode.keytab.file>>>             |
-|                     | <<<dfs.namenode.name.dir>>>                |
-|                     | <<<dfs.namenode.edits.dir>>>               |
-|                     | <<<dfs.namenode.checkpoint.dir>>>          |
-|                     | <<<dfs.namenode.checkpoint.edits.dir>>>    |
-*---------------------+--------------------------------------------+
-| Secondary Namenode  | <<<dfs.namenode.secondary.http-address>>>  |
-|                     | <<<dfs.secondary.namenode.keytab.file>>>   |
-*---------------------+--------------------------------------------+
-| BackupNode          | <<<dfs.namenode.backup.address>>>          |
-|                     | <<<dfs.secondary.namenode.keytab.file>>>   |
-*---------------------+--------------------------------------------+
-
-  Here is an example configuration with two Namenodes:
-
-----
-<configuration>
-  <property>
-    <name>dfs.nameservices</name>
-    <value>ns1,ns2</value>
-  </property>
-  <property>
-    <name>dfs.namenode.rpc-address.ns1</name>
-    <value>nn-host1:rpc-port</value>
-  </property>
-  <property>
-    <name>dfs.namenode.http-address.ns1</name>
-    <value>nn-host1:http-port</value>
-  </property>
-  <property>
-    <name>dfs.namenode.secondaryhttp-address.ns1</name>
-    <value>snn-host1:http-port</value>
-  </property>
-  <property>
-    <name>dfs.namenode.rpc-address.ns2</name>
-    <value>nn-host2:rpc-port</value>
-  </property>
-  <property>
-    <name>dfs.namenode.http-address.ns2</name>
-    <value>nn-host2:http-port</value>
-  </property>
-  <property>
-    <name>dfs.namenode.secondaryhttp-address.ns2</name>
-    <value>snn-host2:http-port</value>
-  </property>
-
-  .... Other common configuration ...
-</configuration>
-----
-
-** Formatting Namenodes
-
-  <<Step 1>>: Format a Namenode using the following command:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
-----
-  Choose a unique cluster_id which will not conflict other clusters in
-  your environment. If a cluster_id is not provided, then a unique one is
-  auto generated.
-
-  <<Step 2>>: Format additional Namenodes using the following command:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
-----
-  Note that the cluster_id in step 2 must be same as that of the
-  cluster_id in step 1. If they are different, the additional Namenodes
-  will not be part of the federated cluster.
-
-** Upgrading from an older release and configuring federation
-
-  Older releases only support a single Namenode.
-  Upgrade the cluster to newer release in order to enable federation
-  During upgrade you can provide a ClusterID as follows:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode -upgrade -clusterId <cluster_ID>
-----
-  If cluster_id is not provided, it is auto generated.
-
-** Adding a new Namenode to an existing HDFS cluster
-
-  Perform the following steps:
-
-  * Add <<<dfs.nameservices>>> to the configuration.
-
-  * Update the configuration with the NameServiceID suffix. Configuration
-    key names changed post release 0.20. You must use the new configuration
-    parameter names in order to use federation.
-
-  * Add the new Namenode related config to the configuration file.
-
-  * Propagate the configuration file to the all the nodes in the cluster.
-
-  * Start the new Namenode and Secondary/Backup.
-
-  * Refresh the Datanodes to pickup the newly added Namenode by running
-    the following command against all the Datanodes in the cluster:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
-----
-
-* {Managing the cluster}
-
-**  Starting and stopping cluster
-
-  To start the cluster run the following command:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
-----
-
-  To stop the cluster run the following command:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
-----
-
-  These commands can be run from any node where the HDFS configuration is
-  available.  The command uses the configuration to determine the Namenodes
-  in the cluster and then starts the Namenode process on those nodes. The
-  Datanodes are started on the nodes specified in the <<<slaves>>> file. The
-  script can be used as a reference for building your own scripts to
-  start and stop the cluster.
-
-**  Balancer
-
-  The Balancer has been changed to work with multiple
-  Namenodes. The Balancer can be run using the command:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
-----
-
-  The policy parameter can be any of the following:
-
-  * <<<datanode>>> - this is the <default> policy. This balances the storage at
-    the Datanode level. This is similar to balancing policy from prior releases.
-
-  * <<<blockpool>>> - this balances the storage at the block pool
-    level which also balances at the Datanode level.
-
-  Note that Balancer only balances the data and does not balance the namespace.
-  For the complete command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
-
-** Decommissioning
-
-  Decommissioning is similar to prior releases. The nodes that need to be
-  decomissioned are added to the exclude file at all of the Namenodes. Each
-  Namenode decommissions its Block Pool. When all the Namenodes finish
-  decommissioning a Datanode, the Datanode is considered decommissioned.
-
-  <<Step 1>>: To distribute an exclude file to all the Namenodes, use the
-  following command:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>
-----
-
-  <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
-----
-
-  The above command uses HDFS configuration to determine the
-  configured Namenodes in the cluster and refreshes them to pick up
-  the new exclude file.
-
-** Cluster Web Console
-
-  Similar to the Namenode status web page, when using federation a
-  Cluster Web Console is available to monitor the federated cluster at
-  <<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
-  Any Namenode in the cluster can be used to access this web page.
-
-  The Cluster Web Console provides the following information:
-
-  * A cluster summary that shows the number of files, number of blocks,
-    total configured storage capacity, and the available and used storage
-    for the entire cluster.
-
-  * A list of Namenodes and a summary that includes the number of files,
-    blocks, missing blocks, and live and dead data nodes for each
-    Namenode. It also provides a link to access each Namenode's web UI.
-
-  * The decommissioning status of Datanodes.


Mime
View raw message