hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Fri, 06 May 2011 20:31:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030121#comment-13030121

Owen O'Malley commented on MAPREDUCE-2454:

The map output key and value types are controlled by the application, not the framework. A
plugin that can only sort Text objects isn't general purpose enough. Even streaming created
a lot of trouble for the users by requiring UTF-8 encoding of the data. 

The only acceptable solution would be to define this API and refactor the current code into
a default plugin.

I hadn't thought enough about the combiner. It requires an inversion of control since the
start of the combiner happens based on the spill.

package org.apache.hadoop.mapreduce.task;

public abstract class SortPlugin {

  public interface CombinerCallback {
    /** Called once for each partition of the map output */
    void runCombiner(RawRecordReader reader,
                     RawRecordWriter writer
                    ) throws IOException, InterruptedException;

  /** Called once in map task for collector to gather
      output coming from map. */
  public abstract RawRecordWriter createRawRecordWriter()
    throws IOException, InterruptedException;

  /** Called once in the map task, if there is a combiner. */
  public abstract void registerCombinerCallback(CombinerCallback callback)
    throws IOException, InterruptedException;

  /** Called once in the reduce task for iterator to provide
      input to the reduce. */ 
  public abstract RawRecordReader createRawRecordReader() 
    throws IOException, InterruptedException;

> Allow external sorter plugin for MR
> -----------------------------------
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, MapOutputSorterAbstract.java,
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message