hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6581) Write to single replica in memory
Date Thu, 21 Aug 2014 22:00:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106025#comment-14106025
] 

Colin Patrick McCabe commented on HDFS-6581:
--------------------------------------------

Looks good overall.  It's good to see progress on this.

Some comments about the design doc:

* Why not use ramfs instead of tmpfs?  ramfs can't swap.

** The problem with using tmpfs is that the system could move the data to swap at any time.
 In addition to performance problems, this could cause correctness problems later when we
read back the data from swap (i.e. from the hard disk).  Since we don't want to verify checksums
here, we should use a storage method that we know never touches the disk.  Tachyon uses ramfs
instead of tmpfs for this reason.

* An LRU replacement policy isn't a good choice.  It's very easy for a batch job to kick out
everything in memory before it can ever be used again (thrashing).  An LFU (least frequently
used) policy would be much better.  We'd have to keep usage statistics to implement this,
but that doesn't seem too bad.

* How is the maximum tmpfs/ramfs size per datanode configured?  I think we should use the
existing {{dfs.datanode.max.locked.memory}} property to configure this, for consistency. 
System administrators should not need to configure separate pools of memory for HDFS-4949
and this feature.  It should be one memory size.

** I also think that cache directives from HDFS-4949 should take precedence over this opportunistic
write caching.  If we need to evict some HDFS-5851 cache items to finish our HDFS-4949 caching,
we should do that.

* Related to that, we might want to rename {{dfs.datanode.max.locked.memory}} to {{dfs.data.node.max.cache.memory}}
or something.

* You can effectively revoke access to a block file stored in ramfs or tmpfs by truncating
that file to 0 bytes.  The client can hang on to the file descriptor, but this doesn't keep
any data bytes in memory.  So we can move things out of the cache even if the clients are
unresponsive.  Also see HDFS-6750 and HDFS-6036 for examples of how we can ask the clients
to stop using a short-circuit replica before tearing it down.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to a single
replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can revisit at a later
time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message