hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-445) Parallel data/socket writing for DFSOutputStream
Date Fri, 11 Aug 2006 22:31:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-445?page=comments#action_12427639 ] 
            
Benjamin Reed commented on HADOOP-445:
--------------------------------------

You are right. It is the same. What was the performance difference you saw after the patch?

There actually is a problem with head contention in the common case that happens with my patch.
When a client is writing to a datanode running on the same machine, depending on the configuration,
the backup file and the datanode could be contending for the disk head. In reality, if the
client is talking to a local datanode, something we try to make happen, we should not write
to the backup file, which should also give us another performance boost.

> Parallel data/socket writing for DFSOutputStream
> ------------------------------------------------
>
>                 Key: HADOOP-445
>                 URL: http://issues.apache.org/jira/browse/HADOOP-445
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Benjamin Reed
>         Attachments: fastClientWrite.patch
>
>
> Currently, as DFS clients output blocks they write the entire block to disk before starting
to transmit to the datanode. By writing to disk the client is able to retry a block write
if the datanode files in the middle of a block transfer. Writing to disk and then to the datanode
adds latency. Hopefully, the common case is that block transfers to datanodes are successful.
This patch writes to the datanode and the disk in parallel. If the write to the datanode fails,
it falls back to current behavior.
> In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm seeing a
20-25% improvement in throughput.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message