hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Sun, 14 Oct 2012 22:31:06 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475913#comment-13475913
] 

Mariappan Asokan commented on MAPREDUCE-2454:
---------------------------------------------

Hi Arun,
  Thanks for your feedback.  Though I have confidence in my contribution(I have been running
Terasort without any problems for data sizes of 2 TB on a small cluster), I understand your
concerns on the size of the patch.  I can think of the following sub-steps each addressed
in a different Jira:

* Refactor {{Task.java}} so that the classes {{ValuesIterator}}, {{CombinerRunner}}, and {{CombineValuesIterator}},
{{CombineOutputCollector}} can be taken out to separate
files.

* Refactor {{MapOutput.java}} to create {{InMemoryMapOutput}} and {{OnDiskMapOutput}} classes.

* Refactor {{Shuffle.java}} and {{MergeManager.java}} to decouple shuffle and merge.  This
should also allow one to make shuffle pluggable.  There will be a small change to {{ReduceTask.java}}
as part of this decoupling since {{ReduceTask}} will instantiate both {{Shuffle}} and {{MergeManager}}
objects.

* Refactor {{MapTask.java}} so that the code related to sort on the map side is moved to a
new file {{MapSort.java}}.  Introduce {{SortinRecordWriter}} and {{SortoutRecordReader}} classes
as part of this refactoring.

* Refactor {{ReduceTask.java}} so that merge related code is moved to a new file {{ReduceSort.java}}.

* Define corresponding interfaces for {{MapSort}} and {{ReduceSort}} classes and make these
implementations pluggable.

How does the above sequence of changes sound to you?  I can raise separate Jiras for each
one.  We can keep these changes in a separate branch before moving to the trunk if you wish.

If you have other suggestions, please let me know.

Thanks again.

-- Asokan

                
> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>            Reporter: Mariappan Asokan
>            Assignee: Mariappan Asokan
>            Priority: Minor
>              Labels: features, performance, plugin, sort
>         Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java,
MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message