hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS
Date Fri, 15 Feb 2008 18:20:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569352#action_12569352
] 

Raghu Angadi commented on HADOOP-2758:
--------------------------------------

Note regd 'dfs -cat' numbers: These are end to end tests and numbers vary depending on how
many instances we run. Just as in any end-to-end test there are multiple factors that are
not affected by this patch. This patch reduces CPU consumed by DataNode while serving data.
It cannot be directly comapred from 'dfs -cat' numbers. 

I have _semi-directly_ calculated DataNode with the patch takes *35-45% of CPU it used to
take before*. This calculation uses 9/10 ratio from HADOOP-2144. 'top' on my dev box truncates
summed up cpu to 99.9 (unlike on the machine used in HADOOP-2144), other wise we could directly
compare CPU taken by DataNode instead of calculating it indirectly.

Sameer asked me to compare single instance of 'dfs -cat' and regular shell 'cat'. I will add
those numbers in the next comment. 

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on the 'read
path' (i.e. path from storage on datanode to user buffer on the client). This jira reduces
these copies by enhancing data read protocol and implementation of read on both datanode and
the client. I will describe the changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause regression in
any benchmarks. It might not improve the benchmarks since most benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message