hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Fri, 06 May 2011 20:31:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030121#comment-13030121
] 

Owen O'Malley commented on MAPREDUCE-2454:
------------------------------------------

The map output key and value types are controlled by the application, not the framework. A
plugin that can only sort Text objects isn't general purpose enough. Even streaming created
a lot of trouble for the users by requiring UTF-8 encoding of the data. 

The only acceptable solution would be to define this API and refactor the current code into
a default plugin.

I hadn't thought enough about the combiner. It requires an inversion of control since the
start of the combiner happens based on the spill.

{code:title=SortPlugin}
package org.apache.hadoop.mapreduce.task;

public abstract class SortPlugin {

  public interface CombinerCallback {
    /** Called once for each partition of the map output */
    void runCombiner(RawRecordReader reader,
                     RawRecordWriter writer
                    ) throws IOException, InterruptedException;
  }

  /** Called once in map task for collector to gather
      output coming from map. */
  public abstract RawRecordWriter createRawRecordWriter()
    throws IOException, InterruptedException;

  /** Called once in the map task, if there is a combiner. */
  public abstract void registerCombinerCallback(CombinerCallback callback)
    throws IOException, InterruptedException;

  /** Called once in the reduce task for iterator to provide
      input to the reduce. */ 
  public abstract RawRecordReader createRawRecordReader() 
    throws IOException, InterruptedException;
}
{code}

> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, MapOutputSorterAbstract.java,
ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message