hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3446) The reduce task should not flush the in memory file system before starting the reducer
Date Thu, 14 Aug 2008 02:17:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated HADOOP-3446:
----------------------------------

    Attachment: 3446-0.patch

I tested this on a 100 node cluster (98 tasktrackers) using sort. Given 300MB/node of data
and a sufficiently large io.sort.mb and fs.inmemory.size.mb, io.sort.spill.percent=1.0, fs.inmemory.merge.threshold=0,
and mapred.inmem.usage=1.0, each reduce took an average of 121 seconds when reading from disk
vs 79 seconds merging and reducing from memory. While the sort with the patch finished the
job in 8 minutes instead of 9, both had slow tasktrackers that threw off the running time.

This also includes some similar changes to MapTask, letting the record and serialization buffer
soft limits be configured separately.

> The reduce task should not flush the in memory file system before starting the reducer
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3446
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3446
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Critical
>         Attachments: 3446-0.patch
>
>
> In the case where the entire reduce inputs fit in ram, we currently force the input to
disk and re-read it before giving it to the reducer. It would be much better if we merged
from the ramfs and any spills to feed the reducer its input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message