hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS
Date Fri, 15 Feb 2008 19:28:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569380#action_12569380
] 

Raghu Angadi commented on HADOOP-2758:
--------------------------------------

Regd couple of concerns in Konstantin's review :

- > 2. Do we still need the notion of a chunk? [...]
-- I think so. A CRC chunk is still central to many things that DataNode and DFSClients do.
It is very useful for discussions, descriptions and even in code to have a single word to
consistently describe this essential unit of DFS data. If we see a member called 'sendChunk()',
its clear what it sends. For e.g. this patch renamed {{sendChunk()}} to {{sendChunks(int)}}
because it sends multiple CRC chunks.

- > 5. DATA_TRANSFER_VERSION : I generally do not understand what is the meaning of this
constant, [...]
-- data transfers do not use RPCs. As noted in the comment, it unfortunately does depend on
Datanode serializations. Probably it should not. This is analogous RPC versions and a Protocol
version, which are at two different levels of the stack.

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on the 'read
path' (i.e. path from storage on datanode to user buffer on the client). This jira reduces
these copies by enhancing data read protocol and implementation of read on both datanode and
the client. I will describe the changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause regression in
any benchmarks. It might not improve the benchmarks since most benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message