hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
Date Mon, 14 Oct 2013 21:23:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794501#comment-13794501
] 

Chris Nauroth commented on HDFS-5096:
-------------------------------------

A couple of quick comments looking at version 6 of the patch:

{{CacheReplicationMonitor#rescanCachedBlockMap}}: Something seems off about the logic for
manipulating pending-cached and pending-uncached.  Is it just that the comments are wrong
and I'm getting confused?

{code}
      if (neededReplication <= numCached) {
        // If we have all the replicas we need, or too few, drop all 
        // pending cached.
        for (DatanodeDescriptor datanode : pendingCached) {
          datanode.getPendingCached().removeElement(cblock);
        }
      }
      if (neededReplication >= numCached) {
        // If we have all the replicas we need, or too many, drop all
        // pending cached.
        for (DatanodeDescriptor datanode : pendingUncached) {
          datanode.getPendingUncached().removeElement(cblock);
        }
      }
{code}

{{CacheReplicationMonitor#rescanFile}}: Can you explain the logic around mark in this method?
 I understand the mark logic in {{rescanCachedBlockMap}}, but I didn't follow it here.

{code}
        if (mark != ocblock.getMark()) {
          ocblock.setReplicationAndMark(pce.getReplication(), mark);
        } else {
          ocblock.setReplicationAndMark((short)Math.max(
              pce.getReplication(), ocblock.getReplication()), mark);
        }
{code}

{{NameNode}}: Is this HA change meant for this patch, or is it meant to be its own patch that
can go to trunk?

Several tests are commented out in this version of the patch so that they aren't running.


> Automatically cache new data added to a cached path
> ---------------------------------------------------
>
>                 Key: HDFS-5096
>                 URL: https://issues.apache.org/jira/browse/HDFS-5096
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Andrew Wang
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch
>
>
> For some applications, it's convenient to specify a path to cache, and have HDFS automatically
cache new data added to the path without sending a new caching request or a manual refresh
command.
> One example is new data appended to a cached file. It would be nice to re-cache a block
at the new appended length, and cache new blocks added to the file.
> Another example is a cached Hive partition directory, where a user can drop new files
directly into the partition. It would be nice if these new files were cached.
> In both cases, this automatic caching would happen after the file is closed, i.e. block
replica is finalized.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message