hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5203) Concurrent clients that add a cache directive on the same path may prematurely uncache from each other.
Date Fri, 13 Sep 2013 21:12:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766956#comment-13766956

Chris Nauroth commented on HDFS-5203:

I propose that we return a unique ID for each client request to add a cache directive, even
if the path is a duplicate.  Then, the cache reports would only instruct the datanodes to
{{munlock}} after all clients have removed their cache directives for that path.  I would
not expect a huge growth in memory consumption by the {{CacheManager}} data structures, because
I would expect typical usage to be adding cache directives as a one-time setup step instead
of in a tight loop (i.e. in the job driver before submitting rather than from within individual
map or reduce tasks).

An additional complication would come into play if we added TTL, because different cache directives
for the same path could have different TTLs.  We probably ought to honor keeping the path
cached for the duration of the longest TTL.
> Concurrent clients that add a cache directive on the same path may prematurely uncache
from each other.
> -------------------------------------------------------------------------------------------------------
>                 Key: HDFS-5203
>                 URL: https://issues.apache.org/jira/browse/HDFS-5203
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: HDFS-4949
>            Reporter: Chris Nauroth
> When a client adds a cache directive, we assign it a unique ID and return that ID to
the client.  If multiple clients add a cache directive for the same path, then we return the
same ID.  If one client then removes the cache entry for that ID, then it is removed for all
clients.  Then, when this change becomes visible in subsequent cache reports, the datanodes
may {{munlock}} the block before the other clients are done with it.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message