hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Mon, 19 Nov 2012 18:34:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500476#comment-13500476
] 

Alejandro Abdelnur commented on MAPREDUCE-2454:
-----------------------------------------------

Now following up with Arun's concern on 'passing a shuffle down to the merger', after spending
some extra time looking at the code with and without the patch.

I agree with Asokan's arguments on why the shuffle ought to be passed to the merger as in
the latest patch. 

It is a clear separation of concerns, the shuffle only shuffles data without having to be
aware of how that data is handled afterwards.

The change does not change the end behavior of the shuffle-merge phase, thus it does not break
any existing MR application. Nor it can break any existing Hadoop plugin (as all this was
hardcoded and it could not be replaced).

Also, the change does not preclude in the future implementing things like a push shuffle.

Regarding Arun's suggestion:

bq. It's trivial to return an iterator from a copy-only shuffle which is backed by a blocking
shuffle which waits till any (not all) key/value pairs have been shuffled over the network.

This would require changes in the shuffle, which could significantly increase the scope of
work of this JIRA. On the other hand, the latest patch does not modify the Shuffle.

My take here is along the lines of Arun's comment:

bq. I've spent sometime thinking about this - and I feel we can do something far simpler to
address Syncsort's goal of plugging in your proprietary sort while mitigating risk to MR itself....How
about this: I feel we could accomplish both goals by something very simple..

Echoing Arun, we are mitigating risk while enabling the desired functionality.






                
> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>            Reporter: Mariappan Asokan
>            Assignee: Mariappan Asokan
>            Priority: Minor
>              Labels: features, performance, plugin, sort
>         Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java,
MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch,
mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454-protection-change.patch,
mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message