hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-3594) Guaranteeing that combiner is called at least once
Date Mon, 30 Jun 2008 17:43:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Owen O'Malley resolved HADOOP-3594.

       Resolution: Won't Fix
    Fix Version/s:     (was: 0.19.0)

Olga's concern was that HADOOP-3226 changed the semantics of combiners in an incompatible
way. I've updated the release note on HADOOP-3226 to both call out the semantic change and
point out the deprecated configuration attribute that disables the additional calls the combiner.

> Guaranteeing that combiner is called at least once
> --------------------------------------------------
>                 Key: HADOOP-3594
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3594
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Olga Natkovich
> In 18, hadoop decides how many times to call combiner on both map and reduce sides. The
possible number is between 0 and N. 
> While having multiple invocations can be useful, not invoking combiner at all can have
serious consequences for a range of functions called algebraic (http://classweb.gmu.edu/kersch/inft864/Readings/Shoshani/DataCube/DataCubeTechReport.pdf).
The main properties of such functions is that the intermediate and final computations are
different and that the first invokation transforms the data to a different form. A most common
example of this is AVERAGE function. While it is possible to workaround this issue by annotating
each tuple, it seems that it would be much easier and faster if hadoop always guaranteed at
least a single invocation.
> Not having this guarantee will break all sorts of existing combiners.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message