hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
Date Fri, 06 Nov 2009 19:16:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774377#action_12774377

Doug Cutting commented on MAPREDUCE-1183:

> I'm concerned we do not understand the others enough to design the interfaces well enough

That's a risk.  All I'm suggesting is that, as we alter job submission here we should keep
in mind where we'd like to go.  In particular, we should try not to change it incompatibly
more than once.  Currently we support specification by class name.  If we add another Java-specific
mechanism now, we'll have to support it too going forward.  If we know we'll change it again
soon, then that will be three mechanisms we'll need to support for a time.  Perhaps that's
tolerable, but two would be better.  This may be an opportunity to "future-proof" job submission,
or it may not be.

For example, perhaps your implementation will use the existing mechanism to provide the new
capability, e.g., it will set a job's mapper to JavaSerializedMapper, that will then look
on the classpath for a particular file that contains the serialized mapper.  In this case
we can argue that the underlying mechanism isn't changed and all's well.  On the other hand,
if we were to add new job properties that the framework uses in preference to the existing
properties, then we should more carefully think about what these are and whether they're steps
in the direction we want job configurations to take.  Does that make sense?

> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -----------------------------------------------------------------------------
>                 Key: MAPREDUCE-1183
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
> Currently the Map-Reduce framework uses Configuration to pass information about the various
aspects of a job such as Mapper, Reducer, InputFormat, OutputFormat, OutputCommitter etc.
and application developers use org.apache.hadoop.mapreduce.Job.set*Class apis to set them
at job-submission time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with org.apache.hadoop.mapreduce.Job
via actual objects which are then serialized by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message