hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS
Date Wed, 13 Feb 2008 07:06:10 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568452#action_12568452

Raghu Angadi commented on HADOOP-2758:

With a prelimirary patch that removes extra copies on datanode while reading a block, the
results are promising.
Ran 4 instances of 'dfs -cat 5GbFile > /dev/null'  similar to the tests in HADOOP-2144.
 All the blocks are local.

branch-0.16 : ~4 min. cpu bound. user cpu is 3 times the kernel cpu.
trunk + patch : ~3min. disk bound. user cpu is 2 times the kernel cpu. not that much of cpu
was left (~10-20%). 

Also from HADOOP-2144, datanode cpu is around 0.9 times DFSClient cpu. Even after ignoring
idle cpu in the second test, datanode takes less than half of cpu with the patch. This includes
both user and kernel cpu taken by datanode. Assuming kernel cpu is same in both cases, the
user cpu taken by datanode in second test would much less than half (may be closer 1/3rd).

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
> Currently datanode and client part of DFS perform multiple copies of data on the 'read
path' (i.e. path from storage on datanode to user buffer on the client). This jira reduces
these copies by enhancing data read protocol and implementation of read on both datanode and
the client. I will describe the changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause regression in
any benchmarks. It might not improve the benchmarks since most benchmarks are not cpu bound.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message