hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes
Date Fri, 25 Jul 2008 16:23:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616935#action_12616935
] 

Owen O'Malley commented on HADOOP-1230:
---------------------------------------

Doug proposed checking the code in as we work on this patch, because it isn't called by the
rest of the code and will be far easier to review. So the new api is in src/mapred/org/apache/hadoop/mapreduce.
Notable changes since the last patch:
  * The Mapper and Reducer have a new format that combines the MapRunnable with Mapper and
introduces a similar style for Reducer. Their templating is now much easier to understand
and use.
  * The Mapper and Reducer base classes are now the identity functions.
  * I've split the context object into a tree where the lower ones inherit from the one above.
    * JobContext - information about the job
    * TaskAttemptContxt - information about the task
    * TaskInputOutputContext - add input and output methods for the task
    * MapperContext and ReducerContext provide the specific methods for each
  * I added Job, which is how the user sets up, submits, waits for jobs, and gets status.
Job also allows kiling the job or tasks.
  * I split the lib directory into parts for in, map, reduce, parition, out to give a little
hierarchy.
  * I filled in {Text,SequenceFile}{In,Out}putFormat to make sure that I had the interfaces
right.
  * I changed the input methods to match the serialization factory interfaces.
  * JobConf goes away to replaced by Configuration. The getter methods in JobConf mostly go
to JobContext. The setter methods mostly go to Job.
  * A word count example is included. That would clearly be moved to the example source tree
when we are doing the final commit.
  * I removed the number of mappers and replaced it with a max split size. The old model was
very confusing to explain.
  * I used all new attribute names so that we don't have collisions with the old attributes.
  * In the Mapper, the Mapper owns the input key and value, which made the multi-threaded
mapper easier to do. I need a similar scheme in the ReduceContext.getValues.

Missing:
  * I need an interface to query jobs, that were submitted by another process. Probably a
JobTracker class is the best bet that provides query options and returns Jobs.
  * I didn't move TaskCompletionEvents yet.

> Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat,
and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs-2.patch, context-objs-3.patch, context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain backwards compatibility,
I'd suggest that we move over to a new package name (org.apache.hadoop.mapreduce) and deprecate
the old interfaces and package. Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter)
throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, Value), progress(),
etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message