From core-dev-return-31989-apmail-hadoop-core-dev-archive=hadoop.apache.org@hadoop.apache.org Sat Mar 01 08:46:55 2008 Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 67226 invoked from network); 1 Mar 2008 08:46:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Mar 2008 08:46:55 -0000 Received: (qmail 53913 invoked by uid 500); 1 Mar 2008 08:46:49 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 53882 invoked by uid 500); 1 Mar 2008 08:46:49 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 53873 invoked by uid 99); 1 Mar 2008 08:46:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Mar 2008 00:46:49 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Mar 2008 08:46:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2F471234C077 for ; Sat, 1 Mar 2008 00:45:51 -0800 (PST) Message-ID: <1225106629.1204361151190.JavaMail.jira@brutus> Date: Sat, 1 Mar 2008 00:45:51 -0800 (PST) From: "Hadoop QA (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-910) Reduces can do merges for the on-disk map output files in parallel with their copying MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574083#action_12574083 ] Hadoop QA commented on HADOOP-910: ---------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12376894/HADOOP-910.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1882/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1882/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1882/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1882/console This message is automatically generated. > Reduces can do merges for the on-disk map output files in parallel with their copying > ------------------------------------------------------------------------------------- > > Key: HADOOP-910 > URL: https://issues.apache.org/jira/browse/HADOOP-910 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Devaraj Das > Assignee: Amar Kamat > Attachments: HADOOP-910-review.patch, HADOOP-910.patch, HADOOP-910.patch, HADOOP-910.patch > > > Proposal to extend the parallel in-memory-merge/copying, that is being done as part of HADOOP-830, to the on-disk files. > Today, the Reduces dump the map output files to disk and the final merge happens only after all the map outputs have been collected. It might make sense to parallelize this part. That is, whenever a Reduce has collected io.sort.factor number of segments on disk, it initiates a merge of those and creates one big segment. If the rate of copying is faster than the merge, we can probably have multiple threads doing parallel merges of independent sets of io.sort.factor number of segments. If the rate of copying is not as fast as merge, we stand to gain a lot - at the end of copying of all the map outputs, we will be left with a small number of segments for the final merge (which hopefully will feed the reduce directly (via the RawKeyValueIterator) without having to hit the disk for writing additional output segments). > If the disk bandwidth is higher than the network bandwidth, we have a good story, I guess, to do such a thing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.