hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2758) Reduce memory copies when data is read from DFS
Date Wed, 13 Feb 2008 22:37:08 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Raghu Angadi updated HADOOP-2758:

    Attachment: HADOOP-2758.patch

Attached patch removes extra buffer copies when data is read from the data node (by client
or while replicating).

 - before : disk --> large bufferedinputstream --> small datanode buffer --> large
bufferedoutputstream --> socket.
 - after : disk --> large datanode buffer --> socket
 - each arrow represents a memory copy. cost of arrows at the ends is share between user and
kernel, I think (using direct buffer might further reduce that, will try.). 

I will post more microbenchmarks similar to last comment.

We can reduce one copy on the DFSClient. Current {{readChunk()}} interface in {{FSInputChecker}}
does not allow it. We could add optional {{readChunks()}} so that an implementation can get
access to user's complete buffer. There will be a default implementation of this. Should I
file a jira?

This patch changes the DATA_TRANSFER_PROTOCOL a bit. 

Currently there are no improvements in buffering whilre writing data to DFS. I will do that
in a follow up jira.

All the unit tests pass. I will run them on windows as well. No new tests are added since
this does not actually change any functionality and purely a performance improvement. 

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>         Attachments: HADOOP-2758.patch
> Currently datanode and client part of DFS perform multiple copies of data on the 'read
path' (i.e. path from storage on datanode to user buffer on the client). This jira reduces
these copies by enhancing data read protocol and implementation of read on both datanode and
the client. I will describe the changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause regression in
any benchmarks. It might not improve the benchmarks since most benchmarks are not cpu bound.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message