hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iván de Prado (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
Date Wed, 07 Mar 2012 10:34:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224197#comment-13224197
] 

Iván de Prado commented on MAPREDUCE-1183:
------------------------------------------

We have an implementation in Pangool (http://pangool.net) for this. It could serve as a proof-of-concept
and for getting ideas for this ticket as well. 

We are using the DistributedCache with success to send serialized instances of the mappers,
reducers, comparators and input/output formats. You can see an example of usage here: (http://pangool.net/userguide/TupleMrBuilder.html).

For supporting legacy classes, we have created wrapper classes that receive in the constructor
the class to be instantiated. This is the way to be compatible with the code created the old
way.

Something that would be useful is making all Writable classes implement Serializable. This
is needed if you want to be able to create static instances in your mapper/reducer to be reused,
without needing to instantiate them in the setup() method.
                
> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1183
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.24.0
>
>
> Currently the Map-Reduce framework uses Configuration to pass information about the various
aspects of a job such as Mapper, Reducer, InputFormat, OutputFormat, OutputCommitter etc.
and application developers use org.apache.hadoop.mapreduce.Job.set*Class apis to set them
at job-submission time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with org.apache.hadoop.mapreduce.Job
via actual objects which are then serialized by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message