hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Thu, 05 May 2011 15:47:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029386#comment-13029386
] 

Owen O'Malley commented on MAPREDUCE-2454:
------------------------------------------

Actually, I think I made a mistake in pushing the objects into the interface, especially since
I plan to change the serialization layer. I think it would be better to do:

{code title=RawRecordWriter}
package org.apache.hadoop.mapreduce.task;

public abstract class RawRecordWriter implements Closeable {
  /**
   * Called once at start of processing
   */
  public abstract void initialize(TaskAttemptContext context
                                  ) throws IOException, InterruptedException;

  /**
   * Called once per a record. The key and value will be copied before write returns.
   */
  public abstract void write(int partition, ByteBuffer key, ByteBuffer value
                             ) throws IOException, InterruptedException;

  /**
   * Called once at task finish or failure.
   */
  public abstract void close() throws IOException;
}
{code}

For the Reduce side, we could just use the RawKeyValueIterator, but I suspect we'll be in

better shape if we do something similar:

{code title=RawRecordReader.java}
package org.apache.hadoop.mapreduce.task;

public abstract class RawRecordReader implements Closeable {
  /**
   * Called once at start of processing
   */
  public abstract void initialize(TaskAttemptContext context
                                  ) throws IOException, InterruptedException;

  /**
   * Advance to the next record. Returns false when there are no more records.
   */
  pubic abstract boolean next() throws IOException, InterruptedException;

  /**
   * Provides the ByteBuffer with the key. The ByteBuffer may be reused after each call to
   * next.
   */
  public abstract ByteBuffer getKey() throws IOException, InterruptedException;

  /**
   * Provides the ByteBuffer with the value. The ByteBuffer may be reused after each call
to
   * next.
   */
  public abstract ByteBuffer getValue() throws IOException, InterruptedException;

  /**
   * Called once at task finish or failure.
   */
  public abstract void close() throws IOException;  
}
{code}

This has a couple of advantages:
* The plugin gets the TaskAttemptContext and the configuration.
* Serialization stays part of MapReduce instead of the sort library.

> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, MapOutputSorterAbstract.java,
ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message