hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6581) Write to single replica in memory
Date Mon, 22 Sep 2014 20:41:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143773#comment-14143773
] 

Arpit Agarwal commented on HDFS-6581:
-------------------------------------

bq. My fear here is that we will try to implement a better eviction strategy, but find that
the pluggable API introduced in HDFS-7100 is too inflexible to do so. I'm hoping that this
fear is not justified, but until there is an actual LFU or cold/warm/hot scheme implemented,
we won't know for sure. As you said, this isn't much code, so maybe I'll do it if it remains
to be done later.
Colin, LFU may work better for a general purpose cache, but this feature is targeting a specific
use case of smaller intermediate data. Intermediate data is likely to be read once or very
few times and is very likely to not fit the typical LFU use case and in fact NFU may be better.
IMO without real world evaluation there is no data to support one over the other. Let's help
HDFS clients evaluate it.

bq. My fear here is that we will try to implement a better eviction strategy, but find that
the pluggable API introduced in HDFS-7100 is too inflexible to do so.
I don't see any reason to fear. The interface is tagged private and the interactions with
DN are in limited portions of the FsDataset code. It will be easy to update if needed.

bq. to get a benchmark that makes you look better  Clearly the lazy-persist file will still
be in RAM after caches are dropped, whereas the non-lazy one will not. I always repeat experiments
3 times and average, I left that out for brevity
Thanks for the idea, might be useful for future testing. For now I trigger the best case scenario
for non-lazy persist (data already in buffer cache) just to demonstrate performance is at
par. As we'd expect it to be since we're doing SCR from RAM in either case. The numbers are
means over 1000 runs discarding the initial sacrificial read fetching block data to buffer
cache.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, HDFS-6581.merge.03.patch,
HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch,
HDFS-6581.merge.08.patch, HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to a single
replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can revisit at a later
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message