hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
Date Fri, 08 Jun 2012 09:10:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291641#comment-13291641

Lars Hofhansl commented on HDFS-1783:

I did a simple local micro benchmark:

Started a mini cluster with 3 data nodes.
Wrote 1 byte 100.000 times, each followed by an hflush (so 100.000 packets).

With parallel writes it took ~25s, without ~30s (this was repeatable).

Also tried to 10 and 100 byte packets. For 10 bytes I get the same results.
For 100 bytes it took ~29s with parallel writes and ~37s without.

Since this was all on a single machine I am not entirely sure how this would translate to
a real cluster with real network latency.

The latency I measured for my "lo" device is 0.05ms... I would expect the impact of this change
to be more profound in a real cluster setting with latency in the order of a few ms. There
also should be a definite gain when hsync (after HDFS-744) is enabled (but that I cannot test
on a single machine with a single spindle).

> Ability for HDFS client to write replicas in parallel
> -----------------------------------------------------
>                 Key: HDFS-1783
>                 URL: https://issues.apache.org/jira/browse/HDFS-1783
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>            Reporter: dhruba borthakur
>            Assignee: Lars Hofhansl
>         Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, HDFS-1783-trunk-v4.patch,
> The current implementation of HDFS pipelines the writes to the three replicas. This introduces
some latency for realtime latency sensitive applications. An alternate implementation that
allows the client to write all replicas in parallel gives much better response times to these

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message