hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1126) shuffle should use serialization to get comparator
Date Wed, 27 Jan 2010 23:40:35 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tom White updated MAPREDUCE-1126:

    Attachment: MAPREDUCE-1126.patch

Here's a much-simplified patch. To show how it works with nested types I've added an example
mapper with signature {{Mapper<LongWritable, Text, Utf8, Map<Utf8, Long>>}} which
uses the generic Avro serialization for the intermediate key and value. It is configured by

Schema keySchema = Schema.create(Schema.Type.STRING);
Schema valSchema = Schema.parse("{\"type\":\"map\", \"values\":\"long\"}");
AvroGenericData.setMapOutputKeySchema(job, keySchema);
AvroGenericData.setMapOutputValueSchema(job, valSchema);

This replaces the calls to job.setMapOutputKeyClass() and job.setMapOutputValueClass().

I'm interested in hearing people's thoughts about this.

> shuffle should use serialization to get comparator
> --------------------------------------------------
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch,
MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
> Currently the key comparator is defined as a Java class.  Instead we should use the Serialization
API to create key comparators.  This would permit, e.g., Avro-based comparators to be used,
permitting efficient sorting of complex data types without having to write a RawComparator
in Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message