hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-939) No-sort optimization
Date Fri, 26 Jan 2007 01:55:49 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467728

Doug Cutting commented on HADOOP-939:

> 14 (more than half) are unavoidable. 

Make that 15: those associated with the input and output.  So the remaining 12 are associated
with sort & reduce.  9 of those could be eliminated when input is largely pre-sorted and
reduces can be placed on the same rack as the vast majority of their input, reducing the sort/reduce
overhead from 12 out of 27 to 3 out of 18.

> No-sort optimization
> --------------------
>                 Key: HADOOP-939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-939
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Doug Judd
> There should be a way to tell the mapred framework that the output of the map() phase
will already be sorted.  The Reduce phase can just merge the intermediate files together without

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message