hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6581) Write to single replica in memory
Date Fri, 22 Aug 2014 00:58:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106267#comment-14106267

Colin Patrick McCabe commented on HDFS-6581:

The key difference between tmpfs and ramfs is that unprivileged users can't be allowed write
access to ramfs, since you can trivially fill up the entire memory by writing to ramfs.  tmpfs
has a kernel-enforced size limit, and swapping.  Since the design outlined here doesn't require
giving unprivileged users write access to the temporary area, it is compatible with *both*
tmpfs and ramfs.

bq. I do prefer tmpfs as the OS limits tmpfs usage beyond the configured size so the failure
case is safer (DiskOutOfSpace instead of exhaust all RAM). swap is not as much of a concern
as it is usually disabled.

I can think of two cases where we might run out of memory:
1. The user configures the DN to use so much memory for cache that there is not enough memory
to run other programs.

ramfs: causes applications to be aborted with OOM errors.
tmpfs: degrades performance to very slow levels by swapping out our "cached" files.

An OOM error is easy to diagnose.  Sluggish performance is not.  The ramfs behavior is better
than the tmpfs behavior.

2. There is a bug in the DataNode causing it to try to cache more than it should.

ramfs: causes applications to be aborted with OOM errors.
tmpfs: degrades performance to very slow levels by swapping out our "cached" files.

The bug is easy to find when using ramfs, hard to find with tmpfs.

So I would say, tmpfs is always worse for us.  Swapping is just not something we ever want,
and memory limits are something we enforce ourselves, so tmpfs's features don't help us.

bq. Agreed that plain LRU would be a poor choice. Perhaps a hybrid of MRU+LRU would be a good
option. i.e. evict the most recently read replica, unless there are replicas older than some
threshold, in which case evict the LRU one. The assumption being that a client is unlikely
to reread from a recently read replica.

Yeah, we'll need some benchmarking on this probably.

bq. Yes I reviewed the former, it looks interesting with eviction in mind. I'll create a subtask
to investigate eviction via truncate.

Yeah, thanks for the review on HDFS-6750.  As Todd pointed out, we probably want to give clients
some warning before the truncate in HDFS-6581, just like we do with HDFS-4949 and the munlock...

bq. The DataNode does not create the RAM disk since we cannot require root. An administrator
will have to configure the partition.

Yeah, that makes sense.  Similarly, for HDFS-4949, the administrator must set the ulimit for
the DataNode before caching can work.

> Write to single replica in memory
> ---------------------------------
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
> Per discussion with the community on HDFS-5851, we will implement writing to a single
replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can revisit at a later

This message was sent by Atlassian JIRA

View raw message