hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator
Date Wed, 27 Jan 2010 19:39:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805625#action_12805625

Ted Dunning commented on MAPREDUCE-1126:

   >  [From Owen] My assertion is that leaving the type as the primary instrument of the
user in defining the job is correct. 
   > I haven't talked to any users that care about using a non-default serializer for a
given type.

Pig would like to. For scalar types Pig uses Java String, Long, Integer, etc. But default
Java serialization is slow. So currently we convert these to and from Writables as we go across
the Map and Reduce boundaries to get the faster Writable serialization. If we could instead
define an alternate serializer and avoid these conversions it would make our code simpler
and should perform better.

I would like to.  I would like to start using Avro for greater expressive power as soon as
possible.  I also can't change all of my legacy code right away so I will have some code that
implements both Writable and Avro serialization.  I need to be able to use writable for old
code and Avro for new code.

> shuffle should use serialization to get comparator
> --------------------------------------------------
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch,
MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch
> Currently the key comparator is defined as a Java class.  Instead we should use the Serialization
API to create key comparators.  This would permit, e.g., Avro-based comparators to be used,
permitting efficient sorting of complex data types without having to write a RawComparator
in Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message