hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6581) Write to single replica in memory
Date Tue, 23 Sep 2014 23:18:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145599#comment-14145599
] 

Arpit Agarwal commented on HDFS-6581:
-------------------------------------

Preliminary numbers for write throughput.

The test creates, writes and closes 3 x 2GB files in quick succession and computes the mean
E2E time per file. Just looking at raw throughput  makes memory writes look even better.

System RAM: 24GB
RAM Disk: 8GB

*Baseline, checksums ON*
||Block Size (MB)||Mean E2E Latency (ms)||
|128|7235|
|1024|7005|

*Lazy Persist, checksums ON*
||Block Size (MB)||Mean E2E Latency (ms)||Improvement over baseline||
|128|5015|30.6%|
|1024|4635|33.8%|

*Lazy Persist, checksums OFF*
||Block Size (MB)||Mean E2E Latency (ms)||Improvement over baseline||
|128|4504|37.7%|
|1024|4240|39.4%|

The baseline times were all over the map across runs. I picked the best number. If the buffer
cache happens to be dirty - which will be common in practice - the disk write times degrade
to 20s for a 2GB file (100MB/s, which happens to be disk write throughput). Correspondingly
if the RAM disk is full with dirty data and the lazy writer cannot keep up the memory numbers
will suffer. Another potential improvement afforded by writing to RAM disk is that the lazyWriter
can use unbuffered disk writes which avoid churning buffer cache (HDFS-7090). We cannot make
a corresponding fix in our existing data write pipeline as the best case write latency will
suffer significantly.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, HDFS-6581.merge.03.patch,
HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch,
HDFS-6581.merge.08.patch, HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFSWriteableReplicasInMemory.pdf,
Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to a single
replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can revisit at a later
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message