hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
Date Thu, 21 Jun 2012 18:02:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398655#comment-13398655

Daryn Sharp commented on HDFS-1783:

I've only quickly looked at the discussion and the patch, so please excuse me if I'm misunderstanding
the patch.  The following is predicated on the belief this is all client-side.  The client
is constructing pipelines directly to all of the datanodes -- no more daisy-chaining, right?

I think a benchmark on a generally quiescent network with small writes may be misleading.
 The client will now consume a multiple (replication factor) of the outgoing bandwidth it
previously consumed, instead of the bandwidth being amortized over the network.  This may
quickly exhaust the NIC and/or congest the switches en-route to the datanodes.

It would be interesting to see the benchmark with traffic shaping. Ex. perhaps throttle each
host's bandwidth to ~2.5-3X the raw transfer speed of one client.  Run two clients simultaneously
on a host with and w/o parallel writes of files of at least a few blocks and replication factor
3 or more.
> Ability for HDFS client to write replicas in parallel
> -----------------------------------------------------
>                 Key: HDFS-1783
>                 URL: https://issues.apache.org/jira/browse/HDFS-1783
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>            Reporter: dhruba borthakur
>            Assignee: Lars Hofhansl
>         Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, HDFS-1783-trunk-v4.patch,
HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
> The current implementation of HDFS pipelines the writes to the three replicas. This introduces
some latency for realtime latency sensitive applications. An alternate implementation that
allows the client to write all replicas in parallel gives much better response times to these

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message