hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator
Date Tue, 26 Jan 2010 22:02:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805233#action_12805233

Ted Dunning commented on MAPREDUCE-1126:

Isn't there a middle ground available (at least from the user's point of view)?

My thought would be that if the user specifies types in the current style, they would be limited
to Writables in the current fashion.  That could be marked as old-fashioned, but I wouldn't
necessarily deprecate it.  It does leave Writable in a privileged position relative to other
serialization frameworks, but it *is* in a privileged position since it existed first.

Alternately, the user could specify a serialization framework specific configuration much
like Doug suggests.  It should be true that if any non-standard serialization is used that
specifying a type is an error and vice versa.  This should be easy to check.

>From the user's point of view, they could use old-style job configuration or the new style
that Doug suggests.  I strongly prefer the new style, but I wouldn't be anxious to have to
change all my old style programs.

Under the covers, almost anything could happen, but the important thing that would happen
is that if any special serialization is invoked, the job config would need to know about it
which might affect many other components like the shuffle.

Is there any technical reason why this cannot be made to work?

Is there really any philosophical reason that old programs must be broken?

If no and no, why is there a problem here?  I think that this middle ground would satisfy
Owen's (and my own) needs for backwards compatibility as well as Doug's (and my own) desire
for flexibility for serialization.

> shuffle should use serialization to get comparator
> --------------------------------------------------
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch,
MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch
> Currently the key comparator is defined as a Java class.  Instead we should use the Serialization
API to create key comparators.  This would permit, e.g., Avro-based comparators to be used,
permitting efficient sorting of complex data types without having to write a RawComparator
in Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message