crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-286) ability to specify a different function for combiner & reducer
Date Fri, 25 Oct 2013 20:58:32 GMT


Gabriel Reid commented on CRUNCH-286:

I was just thinking about this one again, and started coming back around to the idea that
Josh had about making it possible for a DoFn to see in which context it's running. What I'm
thinking is that we could introduce something like a multi-phase CombineFn implementation,
something like this that would automatically select the underlying CombineFn to run based
on the context it's running in:
MultiPhaseCombineFn<K,V>(CombineFn<K,V> mapPhaseCombineFn, CombineFn<K,V>

This would give us the same functionality as the approach here, but wouldn't require changing
the the interface of PGroupedTable. It would also avoid adding more direct links to MapReduce
in the PCollection API (not something I'm that worried about, but still maybe worth considering).
I'm definitely ok with this approach too, but just wanted to put the other approach out there
to see if anyone has any other thoughts on it.

> ability to specify a different function for combiner & reducer
> --------------------------------------------------------------
>                 Key: CRUNCH-286
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stefan De Smit
>            Assignee: Josh Wills
>            Priority: Minor
>         Attachments: 0001-add-combineValues-method-with-2-function-arguments.patch, 0002-.patch
> Extend PGroupedTable with an extra combineValues function that accepts 2 functions: 1
for combiner phase, 1 for reducer phase.
> This way, different algorithm can be applied.

This message was sent by Atlassian JIRA

View raw message