hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
Date Thu, 05 Nov 2009 18:48:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774027#action_12774027
] 

Doug Cutting commented on MAPREDUCE-1183:
-----------------------------------------

This would be a nice API for Java.

How would we implement this?  Would we serialize these to the splits file? To a new per-job
file?  In a parameter to the job-submission RPC?

Long-term it would be nice if job-submissions could be easily made by non-Java applications.
 So a job submission might specify a TaskRunner implementation name, plus one or more blobs
that are consumed by that TaskRunner, to implement map, reduce, partition, inputformat and
outputformat, etc.  JavaTaskRunner might use Java serialization to create its blobs, while
a PythonTaskRunner and CTaskRunner might do something else.  The TaskRunners would all be
implemented in Java, but would provide the glue for other native MapReduce APIs.  If we agree
that this is the sort of long-term architecture we seek, should we add it now?


> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1183
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> Currently the Map-Reduce framework uses Configuration to pass information about the various
aspects of a job such as Mapper, Reducer, InputFormat, OutputFormat, OutputCommitter etc.
and application developers use org.apache.hadoop.mapreduce.Job.set*Class apis to set them
at job-submission time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with org.apache.hadoop.mapreduce.Job
via actual objects which are then serialized by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message