hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator
Date Sat, 16 Jan 2010 00:08:54 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801022#action_12801022

Aaron Kimball commented on MAPREDUCE-1126:

Do you think that it's okay to add methods like {{Job.setMapOutputKeySchema()}} then? In the
limit, if another serialization framework makes its way into common use (e.g., Hessian, Protobufs)
in Hadoop, we would then need to add a mechanism to set their serialization system-specific
metadata to {{Job}} as well. We factored out InputFormat/OutputFormat-specific getters and
setters (c.f. {{FileInputFormat.addInputPath()}}) a while back, and deprecated {{JobConf.addInputPath()}};
this seems like a logical next step.

Furthermore, what specific framework dependencies are you referring to? The jobdata package
provides getters and setters that allow users to configure serialization system-specific metadata
keys and values, but they are put into well-defined "system wide" metadata locations (e.g.
{{JobContext.MAP_OUTPUT_KEY_METADATA}}) in the Configuration itself. The SerializerBase/DeserializerBase
classes are instantiated in JobConf without touching the {{jobdata}} package at all (they
rely only on the system-wide Configuration names).

The only dependency on {{jobdata}} classes in Job/JobContext/JobConf is to push-down the now-deprecated
getter/setter methods that the user would call in legacy code, but the framework no longer
makes any calls to {{JobConf.getMapOutputKeyClass()}}. It instead calls {{JobConf.getMapOutputKeySerializer()}}
and  {{JobConf.getMapOutputKeyDeserializer()}} directly.

> shuffle should use serialization to get comparator
> --------------------------------------------------
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch,
MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch
> Currently the key comparator is defined as a Java class.  Instead we should use the Serialization
API to create key comparators.  This would permit, e.g., Avro-based comparators to be used,
permitting efficient sorting of complex data types without having to write a RawComparator
in Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message