hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
Date Thu, 17 Jan 2013 20:02:17 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556536#comment-13556536

Chris Douglas commented on MAPREDUCE-4808:

Asokan, the concern is that even breaking an API, even if it's marked unstable, is an incompatible
change. Since the pluggable shuffle is particularly useful for frameworks, breaking this contract
could require patching/validation/rewrite of plugin and optimizer code in projects that invest
in it (Hive, Pig, etc.). Moreover, if we wanted to change the default {{Shuffle}} to a different
implementation, then user/framework code would perform badly- or break- unless we exposed
this implementation-specific mechanism in the _new_ impl. So it's fair to press for use cases,
to ensure it's _sufficient_ and that the abstraction could apply to most {{Shuffle}} implementations.

Personally, I'm ambivalent about exposing this as an API and am +1 on the patch overall (mostly
because I like the {{MapOutput}} refactoring). The user can always configure the current {{Shuffle}},
which is exactly how frameworks would handle this until they port/specialize their efficient
{{MergeManager}} plugin.

As a compromise, would it make sense to just add a protected {{createMergeManager}} method
to the {{Shuffle}}? The user still needs to configure their custom {{Shuffle}} impl now, but
that's better than the inevitable future where they configure both. It also makes its tie
to this implementation explicit.
> Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
> ----------------------------------------------------------------------------------
>                 Key: MAPREDUCE-4808
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>            Assignee: Mariappan Asokan
>         Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch,
mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch,
> Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate
implementations to be able to reuse portions of the default implementation. 
> This would come with the strong caveat that these classes are LimitedPrivate and Unstable.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message