hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Wed, 25 May 2011 15:22:48 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039157#comment-13039157
] 

Mariappan Asokan commented on MAPREDUCE-2454:
---------------------------------------------

Thanks for all the comments I have received so far.

I am highlighting the changes and additions made.  Please look at the attached
patch file MR-2454-trunkPatchPreview.gz for details.  The patch file reflects
the changes made on the trunk revision.  I will post a patch file for MR-279
branch later once I have the build working.

I would like to receive feedback from all developers and especially the ones who
worked on the files:

Task.java, MapTask.java, and ReduceTask.java.

We will start testing the changes once I receive the feedback.

h4. OVERALL

* The interfaces SortPlugin, MapSortPlugin, and ReduceSortPlugin were added to
facilitate sort plugin implementations.

* The framework code was refactored to implement DefaultSortPlugin,
DefaultMapSortPlugin, and DefaultReduceSortPlugin.

* The shuffle code was decoupled from the framework's merge so that shuffle can
be used by other sort plugin implementations.

* An implementation of an external sort plugin was added under
contrib/sortplugin directory.  It uses Unix sort command to sort the keys.  This
is a work in progress.  Only the UnixMapSortPlugin is currently implemented.

* New interfaces SortinRecordWriter and SortoutRecordReader were introduced.
The sort plugins will provide implementations of these interfaces.

h5. Task.java

* Several useful static classes that were present in this file were taken out
and separate files containg corresponding public classes were created under
hadoop/mapreduce/task.

h5. MapTask.java

* The sort code in MapOutputBuffer class was taken out of this file and it will
live in DefaultMapSortPlugin.java under hadoop/mapreduce/task/map.

h5. ReduceTask.java

* Sort related code was taken out of this file and it will live in
DefaultReduceSortPlugin.java under hadoop/mapreduce/task/reduce.

* Helper methods to create ShuffleRunner and MergeManager instances were added.

h5. Shuffle.java
h5. MergeManager.java

* A new interface ShuffleRunner was introduced and Shuffle will implement this
interface.

* A new interface ShuffleCallback was introduced and will be implemented by
MergeManager.

* The Shuffle class will be dealing with shuffle only.  The MergeManager will no
longer be instantiated by Shuffle.

* An implementation of ShuffleCallback interface will be passed to the run()
method.

* MergeManager which implements ShuffleCallback will be instantiated by
DefaultReduceSortPlugin.

* Any other reduce sort plugin implementation will need to implement
ShuffleCallback interface outside the framework.

* Shuffle class will no longer be taking <K, V> as generic parameters.

h5. Fetcher.java
h5. ShuffleScheduler.java
h5. EventFetcher.java

* The classes in these files will no longer be taking <K, V> as generic
parameters.

* Fetcher will receive a ShuffleCallback object as opposed to a
MergeManager instance.

* Fetcher will delegate the responsibility of copying shuffled data to one of
concrete implementations of MapOutput.

h5. MapOutput.java

* The code in this file was refactored so that the class MapOutput will be
abstract with the concrete implementations OnDiskMapOutput and
InMemoryMapOutput created from the original MapOutput.java and Fetcher.java.
These concrete implementations will be used by MergeManager.

* MapOutput class will no longer be burdened with carrying unrelated information
(MEMORY, DISK, and WAIT.)

* Any other reduce sort plugin implementation will need to provide a concrete
implementation of MapOutput class outside the framework.


> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MR-2454-trunkPatchPreview.gz, MapOutputSorter.java,
MapOutputSorterAbstract.java, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message