hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
Date Sat, 13 Feb 2010 19:52:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833445#action_12833445

Owen O'Malley commented on MAPREDUCE-326:

@Tom: Your interface would be perfectly implementable on *top* of the context object API with
very very little overhead or work. I haven't seen *any* motivation to introduce a new API
below the user-level ones. If you moved your API into o.a.h.mapreduce.lib.raw and marked it
unstable framework writers could experiment with it.

@Chris Dyer: Pipes and streaming certainly need a major pass through to clean up their performance,
although benchmarks have shown that for sort, which is the worst case, pipe's performance
is comparable to Java. Pipes *would* get much easier if it moved to the context object API.
Moving to Tom's API wouldn't help at all over the context object API. 

To avoid the needless serialization, pipe's applications should be using SequenceFileAsBinaryInputFormat
(and OutputFormat). That said, to maximize compatibility with Java, pipe's applications are
allowed to use any input or output format.

> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>         Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to use arbitrary
types complicate the design and lead to lots of object creates and other overhead that a byte
oriented design would not suffer.  I believe the lowest level implementation of hadoop map-reduce
should have byte string oriented APIs (for keys and values).  This API would be more performant,
simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message