hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-445) Parallel data/socket writing for DFSOutputStream
Date Mon, 14 Aug 2006 19:31:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-445?page=comments#action_12427950 ] 
            
Doug Cutting commented on HADOOP-445:
-------------------------------------

> What was the performance difference you saw after the patch?

I think the performance was about the same, mostly things just became more reliable, since
the longer-lived connections caused lots more timeouts.

> Parallel data/socket writing for DFSOutputStream
> ------------------------------------------------
>
>                 Key: HADOOP-445
>                 URL: http://issues.apache.org/jira/browse/HADOOP-445
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Benjamin Reed
>         Attachments: fastClientWrite.patch
>
>
> Currently, as DFS clients output blocks they write the entire block to disk before starting
to transmit to the datanode. By writing to disk the client is able to retry a block write
if the datanode files in the middle of a block transfer. Writing to disk and then to the datanode
adds latency. Hopefully, the common case is that block transfers to datanodes are successful.
This patch writes to the datanode and the disk in parallel. If the write to the datanode fails,
it falls back to current behavior.
> In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm seeing a
20-25% improvement in throughput.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message