hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes
Date Wed, 30 Jul 2008 15:49:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618396#action_12618396
] 

Owen O'Malley commented on HADOOP-1230:
---------------------------------------

{quote}
Q1: Why the string version of the Context.getCounter() method needs an int id ?
{quote}

That was a slip up. The old interface looked like that. I'll fix it.

{quote}
Q2: Would I be able to have a subclass of the Context that supports multiple outputs (ie via
the MultipleOutputs) class?
{quote}

One advantage of making Mapper a base class instead of an interface is that I'd suggest something
like:

{code}
class MultipleOutputMapper extends Mapper {
  private stuff including outer context
  <K,V> void collect(String dest, K key, V value) throws IOException { ... }
}
{code}

the the user's mapper can extend MultipleOutputMapper and get the additional collect method.
Does that make sense? It would also be possible to have the MultipleOutputMapper make a wrapper
Context that included the additional method, but the map method would need to downcast, which
seems less user-friendly.

{quote}
C1: Have you considered instead having a single Context having an InContext and an OutContext
where the IN contains incoming stuff (key, values, splits, jobconf, etc.) and the OUT is used
for the output stuff (collect).
{quote}
Fundamentally, the map and reduce input are the same and are handled by the TaskAttemptContext.
The ReduceContext just provides the utility functions getValues() to iterate through the values
for the current key. I think it would be more confusing to have input, state, and output contexts.

> Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat,
and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs-2.patch, context-objs-3.patch, context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain backwards compatibility,
I'd suggest that we move over to a new package name (org.apache.hadoop.mapreduce) and deprecate
the old interfaces and package. Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter)
throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, Value), progress(),
etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message