Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 29979 invoked from network); 1 Mar 2008 04:16:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Mar 2008 04:16:54 -0000 Received: (qmail 53747 invoked by uid 500); 1 Mar 2008 04:16:49 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 53545 invoked by uid 500); 1 Mar 2008 04:16:48 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 53536 invoked by uid 99); 1 Mar 2008 04:16:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Feb 2008 20:16:48 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Mar 2008 04:16:22 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BCAB6234C07D for ; Fri, 29 Feb 2008 20:15:52 -0800 (PST) Message-ID: <1303962728.1204344952771.JavaMail.jira@brutus> Date: Fri, 29 Feb 2008 20:15:52 -0800 (PST) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-2758) Reduce memory copies when data is read from DFS In-Reply-To: <29870993.1201814709005.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574066#action_12574066 ] rangadi edited comment on HADOOP-2758 at 2/29/08 8:15 PM: --------------------------------------------------------------- Edit : typos More comparisions. I hope this shows the improvements. Test : Run *6* instances of 'cat 5GbFile > /dev/null' using a single node cluster. All the blocks are located on local disks (RAID0 I think). The hdfs tests include *constant costs* : Client cpu and kernel cpu not on behalf of user processes. Client cpu is at least as much as DataNodes. This implies, 25% improvement in wall clock time implies more that 50% improvement in DataNode cpu. ||Test || Bound By || Run1 || Run2 || Run3 || Avg || Percentage || Note || | Trunk | CPU | 355 | 332 | 346 | 344 | 100% | | | Trunk + patch | CPU | 225 | 213 | 228 | 222 | 65% | | | cat command | Disk IO | 185 | 83 | 105 | 124 | 36% | Not really comparable| Even 21 instances of 'cat allBlocksForFile > /dev/null' was not CPU bound. 'cat' takes virtually zero cpu in user space. was (Author: rangadi): More comparisions. I hope this shows the improvements. Test : Run *6* instances of 'cat 5GbFile > /dev/null' using a single node cluster. All the blocks are located on local disks (RAID0) I think. The hdfs tests include *constant costs* : Client cpu and kernel cpu not on behalf of user processes. Client cpu is at least as much as DataNodes. This implies, 25% improvement in wall clock time implies more that 50% improvement in DataNode cpu. ||Test || Bound By || Run1 || Run2 || Run3 || Percentage || Avg || Note || | Trunk | CPU | 355 | 332 | 346 | 344 | 100% | | | Trunk + patch | CPU | 225 | 213 | 228 | 222 | 65% | | | cat command | Disk IO | 185 | 83 | 105 | 124 | 36% | Not really comparable| Even 21 instances of 'cat allBlocksForFile > /dev/null' was not CPU bound. 'cat' takes virtually zero cpu in user space. > Reduce memory copies when data is read from DFS > ----------------------------------------------- > > Key: HADOOP-2758 > URL: https://issues.apache.org/jira/browse/HADOOP-2758 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: Raghu Angadi > Assignee: Raghu Angadi > Fix For: 0.17.0 > > Attachments: HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch > > > Currently datanode and client part of DFS perform multiple copies of data on the 'read path' (i.e. path from storage on datanode to user buffer on the client). This jira reduces these copies by enhancing data read protocol and implementation of read on both datanode and the client. I will describe the changes in next comment. > Requirement is that this fix should reduce CPU used and should not cause regression in any benchmarks. It might not improve the benchmarks since most benchmarks are not cpu bound. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.