hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
Date Thu, 04 Feb 2010 23:17:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829844#action_12829844

Tom White commented on MAPREDUCE-326:

> I don't see what this abstraction is buying you over using ByteBuffer and a Serializer
that knows how to use it.

I can see a few benefits of a low-level binary API for framework developers:
* The new and old API could be written on top of it, effecting a cleaner separation in the
* It would be easier to write powerful new APIs on top of the lower-level API. E.g. MAPREDUCE-1183
would have more scope to deviate from the existing (high-level) API than if it were written
on top of the high-level API.
* It would be easier to optimize certain operations (and perhaps provide them in a library):
an identity mapper or reducer could just copy bytes; unused fields in records could be skipped
over (and not be deserialized).
* We could rewrite Pipes to pass the bytes read in the map/reduce directly to the Pipes process.
Currently Pipes has to do extra round trips. For example, the Pipes Mapper in Java has to
convert the records into Java types, then serialize them when sending them to the Pipes process.
Even in the case of BytesWritable there is an extra copy that could be avoided.

> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to use arbitrary
types complicate the design and lead to lots of object creates and other overhead that a byte
oriented design would not suffer.  I believe the lowest level implementation of hadoop map-reduce
should have byte string oriented APIs (for keys and values).  This API would be more performant,
simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message