hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Wed, 07 Nov 2012 17:33:16 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mariappan Asokan updated MAPREDUCE-2454:
----------------------------------------

    Attachment: mapreduce-2454.patch

Hi Arun,
  Thank you very much for allotting some time to have a conversation with you during Strata
2012.  Here is the list of items we discussed and how I followed up in the new patch.

* With YARN, different MR data processing engines can co-exist in addition to the sort/merge
done after map and before reduce.  Keeping this in mind, I am calling the sort plugin interface
on the map side as {{PostMapProcessor.}}
Similarly, the merge done on the reduce side will be abstracted as {{PreReduceProcessor.}}
* The {{PostMapProcessor}} can simply extend the existing {{MapOutputCollector}} with an {{initialize()}}
method.  The current {{MapOutputBuffer}}
in MapTask.java will implement this interface as the default implementation.
* On the reduce side, my suggestion is to define {{PreReduceProcessor}} based on methods already
available in {{MergeManager}} class.  With minimal changes, this will allow {{MergeManager}}
to implement {PreReduceProcessor.}}
* There is a concern about exposing some APIs as public.  Since the revised patch is much
smaller than the one submitted before(one fourth of the original patch size), the chance of
breaking anything is minimized.  Also, I feel that only a handful of developers will write
plugins.  I have marked all the exposed APIs with proper annotations that APIs are not stable
and there is a risk using them.  The plugin developers should keep up with the changes in
the exposed APIs.  The core Hadoop developers need not worry about maintaining backward compatibility.

The revised patch can be easily integrated with shuffle plugin.

I repeatedly ran terasort benchmark on a  cluster with 55 nodes.  The performance difference
with and without the patch was egligible(plus or minus 1%.)

I would like to receive feedback from you and other developers who are watching this Jira.
 In the meantime, I am creating a new test to test the plugin.

Thanks.
-- Asokan

                
> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>            Reporter: Mariappan Asokan
>            Assignee: Mariappan Asokan
>            Priority: Minor
>              Labels: features, performance, plugin, sort
>         Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java,
MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message