hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
Date Wed, 16 Oct 2013 16:59:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796979#comment-13796979

Chris Nauroth commented on HDFS-5096:

Agreed with Andrew that we're getting close.  Almost all of my prior feedback has been addressed.
 I found a few more small things after reviewing test code.  Here is the full list of remaining
feedback (some of it redundant, but this way you don't have to look at multiple old comments).

hdfs-default.xml: Let's document {{dfs.namenode.path.based.cache.refresh.interval.ms}}.

{{IntrusiveCollection#addFirst}}: This method appears to be only called from test code.  Do
you want to keep it, or is it better to delete it?

{{TestPathBasedCacheRequests#waitForCachedBlocks}}: This is another spot where I think we
should use {{GenericTestUtils#waitFor}}.  Even though the JUnit-level timeouts would abort,
this tends to leave the process hanging around.  {{GenericTestUtils#waitFor}} would throw
and exit more cleanly.

> Automatically cache new data added to a cached path
> ---------------------------------------------------
>                 Key: HDFS-5096
>                 URL: https://issues.apache.org/jira/browse/HDFS-5096
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Andrew Wang
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch,
HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch
> For some applications, it's convenient to specify a path to cache, and have HDFS automatically
cache new data added to the path without sending a new caching request or a manual refresh
> One example is new data appended to a cached file. It would be nice to re-cache a block
at the new appended length, and cache new blocks added to the file.
> Another example is a cached Hive partition directory, where a user can drop new files
directly into the partition. It would be nice if these new files were cached.
> In both cases, this automatic caching would happen after the file is closed, i.e. block
replica is finalized.

This message was sent by Atlassian JIRA

View raw message