Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 58888 invoked from network); 8 Sep 2008 13:19:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Sep 2008 13:19:42 -0000 Received: (qmail 38681 invoked by uid 500); 8 Sep 2008 13:19:38 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 38448 invoked by uid 500); 8 Sep 2008 13:19:37 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 38437 invoked by uid 99); 8 Sep 2008 13:19:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 06:19:36 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 13:18:46 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9BAC0234C1DA for ; Mon, 8 Sep 2008 06:18:47 -0700 (PDT) Message-ID: <26650991.1220879927636.JavaMail.jira@brutus> Date: Mon, 8 Sep 2008 06:18:47 -0700 (PDT) From: "Jothi Padmanabhan (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-3514) Reduce seeks during shuffle, by inline crcs In-Reply-To: <1777511993.1212934184996.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629136#action_12629136 ] jothipn edited comment on HADOOP-3514 at 9/8/08 6:17 AM: ------------------------------------------------------------------- The above patch fixes the problem observed when running with the native lzo library for map output compression. The problem was with the IFileInputStream read method which is required to return the byte read as an integer. A simple assignment of {code} int result = byte {code} does not work as the byte is interpreted as a signed byte and so the assigned integer has a wrong value. Instead the result has to be assigned as {code} int result = (byte & 0xFF) {code} correctly assigns the byte to the integer The following are the performance improvements observed with this patch when running the loadgen program with 60 reduce copiers, 100 http threads, 450 task trackers, with the following command line (Thanks Runping for the identifying the problem configuration that best illustrates the utility of this patch) {noformat} bin/hadoop jar hadoop-0.19.0-dev-test.jar loadgen \ -D test.randomtextwrite.bytes_per_map=$((240*1024)) \ -D test.randomtextwrite.total_bytes=$((200*1024*100000)) \ -D mapred.compress.map.output=false \ -r 2200 \ -outKey org.apache.hadoop.io.Text \ -outValue org.apache.hadoop.io.Text \ -outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \ -outdir fakeout {noformat} The patch showed an overall improvement of about 5% with about 10% improvement in shuffle. Trunk Patch Map Time 6:10 6:04 Reduce Time 17:11 16:22 Overall 23:21 22:26 was (Author: jothipn): The above patch fixes the problem observed when running with the native lzo library for map output compression. The problem was with the IFileInputStream read method which is required to return the byte read as an integer. A simple assignment of {code} int result = byte {code} does not work as the byte is interpreted as a signed byte and so the assigned integer has a wrong value. Instead the result has to be assigned as {code} int result = (byte & 0xFF) {code} correctly assigns the byte to the integer The following are the performance improvements observed with this patch when running the loadgen program with 60 reduce copiers, 100 http threads, 450 task trackers, with the following command line
bin/hadoop jar hadoop-0.19.0-dev-test.jar loadgen \
-D test.randomtextwrite.bytes_per_map=$((240*1024)) \
-D test.randomtextwrite.total_bytes=$((200*1024*100000)) \
-D mapred.compress.map.output=false \
-r 2200 \
-outKey org.apache.hadoop.io.Text \
-outValue org.apache.hadoop.io.Text \
-outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
-outdir fakeout
The patch showed an overall improvement of about 5% with about 10% improvement in shuffle. Trunk Patch Map Time 6:10 6:04 Reduce Time 17:11 16:22 Overall 23:21 22:26 > Reduce seeks during shuffle, by inline crcs > ------------------------------------------- > > Key: HADOOP-3514 > URL: https://issues.apache.org/jira/browse/HADOOP-3514 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.18.0 > Reporter: Devaraj Das > Assignee: Jothi Padmanabhan > Fix For: 0.19.0 > > Attachments: hadoop-3514-v1.patch, hadoop-3514-v10.patch, hadoop-3514-v11.patch, hadoop-3514-v12.patch, hadoop-3514-v2.patch, hadoop-3514-v3.patch, hadoop-3514-v4.patch, hadoop-3514-v5.patch, hadoop-3514-v6.patch, hadoop-3514-v7.patch, hadoop-3514-v8.patch, hadoop-3514-v9.patch, hadoop-3514.patch > > > The number of seeks can be reduced by half in the iFile if we move the crc into the iFile rather than having a separate file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.