Return-Path: X-Original-To: apmail-hadoop-hdfs-commits-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3DBB310082 for ; Thu, 30 Jan 2014 00:18:30 +0000 (UTC) Received: (qmail 31166 invoked by uid 500); 30 Jan 2014 00:18:29 -0000 Delivered-To: apmail-hadoop-hdfs-commits-archive@hadoop.apache.org Received: (qmail 31092 invoked by uid 500); 30 Jan 2014 00:18:28 -0000 Mailing-List: contact hdfs-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-commits@hadoop.apache.org Received: (qmail 31084 invoked by uid 99); 30 Jan 2014 00:18:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2014 00:18:28 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2014 00:18:26 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 5915823888FE; Thu, 30 Jan 2014 00:18:06 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1562650 - in /hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs: CHANGES.txt src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java src/site/apt/CentralizedCacheManagement.apt.vm Date: Thu, 30 Jan 2014 00:18:06 -0000 To: hdfs-commits@hadoop.apache.org From: wang@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20140130001806.5915823888FE@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: wang Date: Thu Jan 30 00:18:05 2014 New Revision: 1562650 URL: http://svn.apache.org/r1562650 Log: HDFS-5841. Update HDFS caching documentation with new changes. (wang) Modified: hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm Modified: hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt?rev=1562650&r1=1562649&r2=1562650&view=diff ============================================================================== --- hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt (original) +++ hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Thu Jan 30 00:18:05 2014 @@ -56,7 +56,6 @@ Release 2.3.0 - UNRELEASED HDFS-4949. Centralized cache management in HDFS (wang and cmccabe) - IMPROVEMENTS HDFS-5360. Improvement of usage message of renameSnapshot and @@ -271,6 +270,8 @@ Release 2.3.0 - UNRELEASED HDFS-5788. listLocatedStatus response can be very large. (Nathan Roberts via kihwal) + HDFS-5841. Update HDFS caching documentation with new changes. (wang) + OPTIMIZATIONS HDFS-5239. Allow FSNamesystem lock fairness to be configurable (daryn) Modified: hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java?rev=1562650&r1=1562649&r2=1562650&view=diff ============================================================================== --- hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java (original) +++ hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java Thu Jan 30 00:18:05 2014 @@ -620,7 +620,7 @@ public class CacheAdmin extends Configur "directives being added to the pool. This can be specified in " + "seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. " + "Valid units are [smhd]. By default, no maximum is set. " + - "This can also be manually specified by \"never\"."); + "A value of \"never\" specifies that there is no limit."); return getShortUsage() + "\n" + "Add a new cache pool.\n\n" + listing.toString(); Modified: hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm?rev=1562650&r1=1562649&r2=1562650&view=diff ============================================================================== --- hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm (original) +++ hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm Thu Jan 30 00:18:05 2014 @@ -22,110 +22,140 @@ Centralized Cache Management in HDFS %{toc|section=1|fromDepth=2|toDepth=4} -* {Background} +* {Overview} - Normally, HDFS relies on the operating system to cache data it reads from disk. - However, HDFS can also be configured to use centralized cache management. Under - centralized cache management, the HDFS NameNode itself decides which blocks - should be cached, and where they should be cached. - - Centralized cache management has several advantages. First of all, it - prevents frequently used block files from being evicted from memory. This is - particularly important when the size of the working set exceeds the size of - main memory, which is true for many big data applications. Secondly, when - HDFS decides what should be cached, it can let clients know about this - information through the getFileBlockLocations API. Finally, when the DataNode - knows a block is locked into memory, it can provide access to that block via - mmap. + in HDFS is an explicit caching mechanism that + allows users to specify to be cached by HDFS. The NameNode will + communicate with DataNodes that have the desired blocks on disk, and instruct + them to cache the blocks in off-heap caches. + + Centralized cache management in HDFS has many significant advantages. + + [[1]] Explicit pinning prevents frequently used data from being evicted from + memory. This is particularly important when the size of the working set + exceeds the size of main memory, which is common for many HDFS workloads. + + [[1]] Because DataNode caches are managed by the NameNode, applications can + query the set of cached block locations when making task placement decisions. + Co-locating a task with a cached block replica improves read performance. + + [[1]] When block has been cached by a DataNode, clients can use a new , + more-efficient, zero-copy read API. Since checksum verification of cached + data is done once by the DataNode, clients can incur essentially zero + overhead when using this new API. + + [[1]] Centralized caching can improve overall cluster memory utilization. + When relying on the OS buffer cache at each DataNode, repeated reads of + a block will result in all replicas of the block being pulled into + buffer cache. With centralized cache management, a user can explicitly pin + only of the replicas, saving memory. * {Use Cases} - Centralized cache management is most useful for files which are accessed very - often. For example, a "fact table" in Hive which is often used in joins is a - good candidate for caching. On the other hand, when running a classic - "word count" MapReduce job which counts the number of words in each - document, there may not be any good candidates for caching, since all the - files may be accessed exactly once. + Centralized cache management is useful for files that accessed repeatedly. + For example, a small in Hive which is often used for joins is a + good candidate for caching. On the other hand, caching the input of a < + one year reporting query> is probably less useful, since the + historical data might only be read once. + + Centralized cache management is also useful for mixed workloads with + performance SLAs. Caching the working set of a high-priority workload + insures that it does not contend for disk I/O with a low-priority workload. * {Architecture} [images/caching.png] Caching Architecture - With centralized cache management, the NameNode coordinates all caching - across the cluster. It receives cache information from each DataNode via the - cache report, a periodic message that describes all the blocks IDs cached on - a given DataNode. The NameNode will reply to DataNode heartbeat messages - with commands telling it which blocks to cache and which to uncache. - - The NameNode stores a set of path cache directives, which tell it which files - to cache. The NameNode also stores a set of cache pools, which are groups of - cache directives. These directives and pools are persisted to the edit log - and fsimage, and will be loaded if the cluster is restarted. - - Periodically, the NameNode rescans the namespace, to see which blocks need to - be cached based on the current set of path cache directives. Rescans are also - triggered by relevant user actions, such as adding or removing a cache - directive or removing a cache pool. - - Cache directives also may specific a numeric cache replication, which is the - number of replicas to cache. This number may be equal to or smaller than the - file's block replication. If multiple cache directives cover the same file - with different cache replication settings, then the highest cache replication - setting is applied. + In this architecture, the NameNode is responsible for coordinating all the + DataNode off-heap caches in the cluster. The NameNode periodically receives + a from each DataNode which describes all the blocks cached + on a given DN. The NameNode manages DataNode caches by piggybacking cache and + uncache commands on the DataNode heartbeat. + + The NameNode queries its set of to determine + which paths should be cached. Cache directives are persistently stored in the + fsimage and edit log, and can be added, removed, and modified via Java and + command-line APIs. The NameNode also stores a set of , + which are administrative entities used to group cache directives together for + resource management and enforcing permissions. + + The NameNode periodically rescans the namespace and active cache directives + to determine which blocks need to be cached or uncached and assign caching + work to DataNodes. Rescans can also be triggered by user actions like adding + or removing a cache directive or removing a cache pool. We do not currently cache blocks which are under construction, corrupt, or otherwise incomplete. If a cache directive covers a symlink, the symlink target is not cached. - Caching is currently done on a per-file basis, although we would like to add - block-level granularity in the future. + Caching is currently done on the file or directory-level. Block and sub-block + caching is an item of future work. -* {Interface} +* {Concepts} - The NameNode stores a list of "cache directives." These directives contain a - path as well as the number of times blocks in that path should be replicated. +** {Cache directive} - Paths can be either directories or files. If the path specifies a file, that - file is cached. If the path specifies a directory, all the files in the - directory will be cached. However, this process is not recursive-- only the - direct children of the directory will be cached. - -** {hdfs cacheadmin Shell} - - Path cache directives can be created by the <<>> command and removed via the <<>> command. To list the current path cache directives, use - <<>>. Each path cache directive has a - unique 64-bit ID number which will not be reused if it is deleted. To remove - all path cache directives with a specified path, use <<>>. - - Directives are grouped into "cache pools." Each cache pool gets a share of - the cluster's resources. Additionally, cache pools are used for - authentication. Cache pools have a mode, user, and group, similar to regular - files. The same authentication rules are applied as for normal files. So, for - example, if the mode is 0777, any user can add or remove directives from the - cache pool. If the mode is 0644, only the owner can write to the cache pool, - but anyone can read from it. And so forth. - - Cache pools are identified by name. They can be created by the <<>> command, modified by the <<>> command, and removed via the <<>> command. To list the current cache pools, use <<>> + A defines a path that should be cached. Paths can be either + directories or files. Directories are cached non-recursively, meaning only + files in the first-level listing of the directory. + + Directives also specify additional parameters, such as the cache replication + factor and expiration time. The replication factor specifies the number of + block replicas to cache. If multiple cache directives refer to the same file, + the maximum cache replication factor is applied. + + The expiration time is specified on the command line as a , a relative expiration time in the future. After a cache directive + expires, it is no longer considered by the NameNode when making caching + decisions. + +** {Cache pool} + + A is an administrative entity used to manage groups of cache + directives. Cache pools have UNIX-like , which restrict which + users and groups have access to the pool. Write permissions allow users to + add and remove cache directives to the pool. Read permissions allow users to + list the cache directives in a pool, as well as additional metadata. Execute + permissions are unused. + + Cache pools are also used for resource management. Pools can enforce a + maximum , which restricts the number of bytes that can be cached in + aggregate by directives in the pool. Normally, the sum of the pool limits + will approximately equal the amount of aggregate memory reserved for + HDFS caching on the cluster. Cache pools also track a number of statistics + to help cluster users determine what is and should be cached. + + Pools also can enforce a maximum time-to-live. This restricts the maximum + expiration time of directives being added to the pool. + +* {<<>> command-line interface} + + On the command-line, administrators and users can interact with cache pools + and directives via the <<>> subcommand. + + Cache directives are identified by a unique, non-repeating 64-bit integer ID. + IDs will not be reused even if a cache directive is later removed. + + Cache pools are identified by a unique string name. + +** {Cache directive commands} *** {addDirective} - Usage: << -replication -pool >>> + Usage: << -pool [-force] [-replication ] [-ttl ]>>> Add a new cache directive. *--+--+ \ | A path to cache. The path can be a directory or a file. *--+--+ +\ | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives. +*--+--+ +-force | Skips checking of cache pool resource limits. +*--+--+ \ | The cache replication factor to use. Defaults to 1. *--+--+ -\ | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives. +\ | How long the directive is valid. Can be specified in minutes, hours, and days, e.g. 30m, 4h, 2d. Valid units are [smhd]. "never" indicates a directive that never expires. If unspecified, the directive never expires. *--+--+ *** {removeDirective} @@ -150,7 +180,7 @@ Centralized Cache Management in HDFS *** {listDirectives} - Usage: <<] [-pool ] >>> + Usage: <<] [-pool ]>>> List cache directives. @@ -159,10 +189,14 @@ Centralized Cache Management in HDFS *--+--+ \ | List only path cache directives in that pool. *--+--+ +-stats | List path-based cache directive statistics. +*--+--+ + +** {Cache pool commands} *** {addPool} - Usage: << [-owner ] [-group ] [-mode ] [-weight ] >>> + Usage: << [-owner ] [-group ] [-mode ] [-limit ] [-maxTtl >>> Add a new cache pool. @@ -175,12 +209,14 @@ Centralized Cache Management in HDFS *--+--+ \ | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755. *--+--+ -\ | Weight of the pool. This is a relative measure of the importance of the pool used during cache resource management. By default, it is set to 100. +\ | The maximum number of bytes that can be cached by directives in this pool, in aggregate. By default, no limit is set. +*--+--+ +\ | The maximum allowed time-to-live for directives being added to the pool. This can be specified in seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. Valid units are [smhd]. By default, no maximum is set. A value of \"never\" specifies that there is no limit. *--+--+ *** {modifyPool} - Usage: << [-owner ] [-group ] [-mode ] [-weight ] >>> + Usage: << [-owner ] [-group ] [-mode ] [-limit ] [-maxTtl ]>>> Modifies the metadata of an existing cache pool. @@ -193,7 +229,9 @@ Centralized Cache Management in HDFS *--+--+ \ | Unix-style permissions of the pool in octal. *--+--+ -\ | Weight of the pool. +\ | Maximum number of bytes that can be cached by this pool. +*--+--+ +\ | The maximum allowed time-to-live for directives being added to the pool. *--+--+ *** {removePool} @@ -208,12 +246,14 @@ Centralized Cache Management in HDFS *** {listPools} - Usage: <<>> + Usage: <<]>>> Display information about one or more cache pools, e.g. name, owner, group, permissions, etc. *--+--+ +-stats | Display additional cache pool statistics. +*--+--+ \ | If specified, list only the named cache pool. *--+--+ @@ -244,10 +284,12 @@ Centralized Cache Management in HDFS * dfs.datanode.max.locked.memory - The DataNode will treat this as the maximum amount of memory it can use for - its cache. When setting this value, please remember that you will need space - in memory for other things, such as the Java virtual machine (JVM) itself - and the operating system's page cache. + This determines the maximum amount of memory a DataNode will use for caching. + The "locked-in-memory size" ulimit (<<>>) of the DataNode user + also needs to be increased to match this parameter (see below section on + {{OS Limits}}). When setting this value, please remember that you will need + space in memory for other things as well, such as the DataNode and + application JVM heaps and the operating system page cache. *** Optional