hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
Date Thu, 05 Nov 2009 05:24:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773783#action_12773783
] 

Tom White commented on MAPREDUCE-1183:
--------------------------------------

+1 This is a nicer API for users, I think.

The only reason not to serialize mappers and reducers that I can think of is that users will
be forced to think about how they are serialized. This may be simply a matter of adding "implements
Serializable" (particularly for stateless mappers and reducers), so maybe it's not a big burden
(and consistency is important).

bq. An application which needs a very small amount of state in the Mapper/Reducer (say a small
map of metadata) is forced to use DistributedCache

Alternatively you can store a small amount of state in the configuration, which is generally
easier.

Also, the new API for TextInputFormat and TextOutputFormat could take a varargs list of paths
in the constructor, or use the builder pattern.


> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1183
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> Currently the Map-Reduce framework uses Configuration to pass information about the various
aspects of a job such as Mapper, Reducer, InputFormat, OutputFormat, OutputCommitter etc.
and application developers use org.apache.hadoop.mapreduce.Job.set*Class apis to set them
at job-submission time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with org.apache.hadoop.mapreduce.Job
via actual objects which are then serialized by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message