Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 21 Aug 2014 22:00:12 +0000 (UTC)
From: "Colin Patrick McCabe (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12722849.1403290594766.6536.1408658412898@arcas>
In-Reply-To: <JIRA.12722849.1403290594766@arcas>
References: <JIRA.12722849.1403290594766@arcas>
Subject: [jira] [Commented] (HDFS-6581) Write to single replica in memory
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106025#comment-14106025 ] 

Colin Patrick McCabe commented on HDFS-6581:
--------------------------------------------

Looks good overall.  It's good to see progress on this.

Some comments about the design doc:

* Why not use ramfs instead of tmpfs?  ramfs can't swap.

** The problem with using tmpfs is that the system could move the data to swap at any time.  In addition to performance problems, this could cause correctness problems later when we read back the data from swap (i.e. from the hard disk).  Since we don't want to verify checksums here, we should use a storage method that we know never touches the disk.  Tachyon uses ramfs instead of tmpfs for this reason.

* An LRU replacement policy isn't a good choice.  It's very easy for a batch job to kick out everything in memory before it can ever be used again (thrashing).  An LFU (least frequently used) policy would be much better.  We'd have to keep usage statistics to implement this, but that doesn't seem too bad.

* How is the maximum tmpfs/ramfs size per datanode configured?  I think we should use the existing {{dfs.datanode.max.locked.memory}} property to configure this, for consistency.  System administrators should not need to configure separate pools of memory for HDFS-4949 and this feature.  It should be one memory size.

** I also think that cache directives from HDFS-4949 should take precedence over this opportunistic write caching.  If we need to evict some HDFS-5851 cache items to finish our HDFS-4949 caching, we should do that.

* Related to that, we might want to rename {{dfs.datanode.max.locked.memory}} to {{dfs.data.node.max.cache.memory}} or something.

* You can effectively revoke access to a block file stored in ramfs or tmpfs by truncating that file to 0 bytes.  The client can hang on to the file descriptor, but this doesn't keep any data bytes in memory.  So we can move things out of the cache even if the clients are unresponsive.  Also see HDFS-6750 and HDFS-6036 for examples of how we can ask the clients to stop using a short-circuit replica before tearing it down.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can revisit at a later time.


--
This message was sent by Atlassian JIRA
(v6.2#6252)