hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
Date Fri, 12 Feb 2010 23:59:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833251#action_12833251

Owen O'Malley commented on MAPREDUCE-326:

This is not a risk since the API is marked "unstable" so we retain the freedom to change it
in any way we like. 

That is completely untrue. It is mark unstable as a warning to users. I would reject any incompatible
change to the API. We should mark the base classes (Mapper, Reducer, InputFormat, OutputFormat,
Partitioner, RecordReader, RecordWriter, Partitioner, *Context) as stable. Pig is already
using the new APIs and so are a lot of other business critical applications.

You can think of this change as a refactor to make the MapReduce shuffle more accessible to
framework developers

My suggestion does it far more easily and with no application disruption.

Furthermore, this *does* double the interface, which is already extremely wide.

the new (context objects) MapReduce API is implemented in library code

But it is the primary interface that our users use. If you are proposing that we make this
the "real" interface, then I've already expressed my -1. Users don't want to think in bytes.


> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>         Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to use arbitrary
types complicate the design and lead to lots of object creates and other overhead that a byte
oriented design would not suffer.  I believe the lowest level implementation of hadoop map-reduce
should have byte string oriented APIs (for keys and values).  This API would be more performant,
simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message