Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <252568237.1218680264403.JavaMail.jira@brutus>
Date: Wed, 13 Aug 2008 19:17:44 -0700 (PDT)
From: "Chris Douglas (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Updated: (HADOOP-3446) The reduce task should not flush the
 in memory file system before starting the reducer
In-Reply-To: <1115045238.1211824682958.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HADOOP-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3446:
----------------------------------

    Attachment: 3446-0.patch

I tested this on a 100 node cluster (98 tasktrackers) using sort. Given 300MB/node of data and a sufficiently large io.sort.mb and fs.inmemory.size.mb, io.sort.spill.percent=1.0, fs.inmemory.merge.threshold=0, and mapred.inmem.usage=1.0, each reduce took an average of 121 seconds when reading from disk vs 79 seconds merging and reducing from memory. While the sort with the patch finished the job in 8 minutes instead of 9, both had slow tasktrackers that threw off the running time.

This also includes some similar changes to MapTask, letting the record and serialization buffer soft limits be configured separately.

> The reduce task should not flush the in memory file system before starting the reducer
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3446
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3446
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Critical
>         Attachments: 3446-0.patch
>
>
> In the case where the entire reduce inputs fit in ram, we currently force the input to disk and re-read it before giving it to the reducer. It would be much better if we merged from the ramfs and any spills to feed the reducer its input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.