hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3586) keep combiner backward compatible with earlier versions of hadoop
Date Wed, 18 Jun 2008 01:04:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Douglas updated HADOOP-3586:

    Attachment: 3586-0.patch

This patch adds a property that will restore the previous behavior, which runs the combiner
once for each partition in each spill. It will also run the combiner for single records whose
serialized size exceeds the buffer size.

It's marked as deprecated, to be removed in a future release.

> keep combiner backward compatible with earlier versions of hadoop
> -----------------------------------------------------------------
>                 Key: HADOOP-3586
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3586
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Chris Douglas
>            Priority: Blocker
>             Fix For: 0.18.0
>         Attachments: 3586-0.patch
> In hadoop 16 and earlier, the combiner was guaranteed to run once and only once for each
map. In 17 this compatibility was slightly broken: the combiner does not run if a single <K,V>
occupies the entire sort buffer. In 18, this is further changed to where the combiner can
be called multiple times on both map and reduce sides.
> This breaks Pig's current implementation of the combiner and it is not easy to fix in
a short period of time.
> We would like to ask that for a way for an application to ask for a backward compatible
behavior for some period of time until it can adjust to the new behavior.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message