hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
Date Tue, 16 Feb 2010 18:41:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834370#action_12834370

Owen O'Malley commented on MAPREDUCE-326:

This seems very similar to what I suggested yesterday. Is there a notable difference?

Yes. To implement yours, a serializer needs to write into a ByteBuffer so that it can hand
it to the framework. So your's is a complicated way of implementing my write(int, ByteBuffer,
ByteBuffer). The advantage of my RawKeyValueOutputStream is that:
* the serializer can write directly to it without putting it into a ByteBuffer
* the mapper doesn't need to pre-declare the sizes of the key and value.

If the goal is to help non-Java frameworks, the best choice is write(int, ByteBuffer,ByteBuffer),
because they can just pass the DirectByteBuffers that they read from the underlying stream.
If the goal is to enable other object serialization models, some variant of the RawKeyValueOutputStream
makes sense, because they can use the stream to serialize the objects. Since I think that

Many folks do seem to have expressed interest in this approach.
I disagree. They all have goals and none of them are solved by adding new abstraction levels.
* Joydeep said he wants sort on output, which is being addressed elsewhere
* Chris Dyer wants efficient pipes, which only needs the raw write.
* Eric14 is primarily motivated by simplifying APIs and avoiding buffer copies, which argues
against adding new levels of abstraction.

I'm not against adding the new method into MapContext and a raw map/reduce api in a contrib
module. That will let us build experience with it. I am very against adding a new level of
abstraction at this point.

> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>         Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to use arbitrary
types complicate the design and lead to lots of object creates and other overhead that a byte
oriented design would not suffer.  I believe the lowest level implementation of hadoop map-reduce
should have byte string oriented APIs (for keys and values).  This API would be more performant,
simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message