hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
Date Tue, 05 Jul 2016 17:32:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362829#comment-15362829
] 

Chris Nauroth commented on HDFS-10594:
--------------------------------------

During initial implementation, we made an intentional choice that a cache directive on a directory
applies to its direct children only, not all descendants recursively.  This behavior is documented
here:

http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html#Cache_directive

I'm not in favor of changing this behavior, because it would be an unexpected change for users
after an upgrade.  It's possible that it would cause the DataNode to {{mlock}} a lot more
files than pre-upgrade.  This would cause either unpredictable caching if the new files exceed
{{dfs.datanode.max.locked.memory}}, possibly caching files that are not useful to cache, or
even worse, blowing out memory budget and causing insufficient memory for services and YARN
containers running on the host.

If there is a desire for this behavior, then a more graceful way to support it would be to
introduce a notion of a recursive cache directive.  This would preserve the existing default
behavior of applying only to direct children.  Users who want the recursive behavior could
opt in by passing a new flag while creating the cache directive.

> CacheReplicationMonitor should recursively rescan the path when the inode of the path
is directory
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10594
>                 URL: https://issues.apache.org/jira/browse/HDFS-10594
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: caching
>    Affects Versions: 2.7.1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10594.001.patch
>
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively rescan the
path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
>         INodeDirectory dir = node.asDirectory();
>         ReadOnlyList<INode> children = dir
>             .getChildrenList(Snapshot.CURRENT_STATE_ID);
>         for (INode child : children) {
>           if (child.isFile()) {
>             rescanFile(directive, child.asFile());
>           }
>         }
>        }
> {code}
> If we did the this logic, it means that some inode files will be ignored when the child
inode is also a directory and there are some other child inode file in it. Finally the child's
child file which belong to this path will not be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message