hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator
Date Thu, 28 Jan 2010 01:04:36 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805732#action_12805732
] 

Jeff Hammerbacher commented on MAPREDUCE-1126:
----------------------------------------------

bq. Especially for frameworks written on top of MapReduce, less restrictive interfaces here
would surely be fertile ground for performance improvements.

bq. Writing wrappers can be irritating, but for the MR API, I'd rather make it easier on common
cases and users than on advanced uses and framework authors.

Great points, Chris. Yahoo! has stated that a significant majority of their MapReduce jobs
are written in Pig, and Facebook says the same of Hive. Among our many customers at Cloudera,
it's far more common to target the MapReduce execution engine with a higher level language
rather than the Java API. What you propose as the common case, then, appears to be uncommon
in practice. Perhaps we should adjust our design criteria to match the usage data reported
by the users of the project?

Thanks,
Jeff

> shuffle should use serialization to get comparator
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch,
MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should use the Serialization
API to create key comparators.  This would permit, e.g., Avro-based comparators to be used,
permitting efficient sorting of complex data types without having to write a RawComparator
in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message