hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1462) Enable context-specific and stateful serializers in MapReduce
Date Fri, 12 Feb 2010 00:48:28 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom White updated MAPREDUCE-1462:
---------------------------------

    Attachment: MAPREDUCE-1462-mr.patch
                MAPREDUCE-1462-common.patch

In order to help understand the problem better I've created a demonstration patch that uses
the SerializationContext-based user API, while retaining the Serialization code that exists
in common. (In fact, I had to make some changes to the Serialization code so that it can retain
its metadata in an instance variable.)

Here's what the configuration looks like for the user:

{code}
Schema keySchema = Schema.create(Schema.Type.STRING);
Schema valSchema = Schema.create(Schema.Type.LONG);
job.setSerialization(Job.SerializationContext.MAP_OUTPUT_KEY,
           new AvroGenericSerialization(keySchema));
job.setSerialization(Job.SerializationContext.MAP_OUTPUT_VALUE,
           new AvroGenericSerialization(valSchema));
{code}

> Enable context-specific and stateful serializers in MapReduce
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1462
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1462
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: h-1462.patch, MAPREDUCE-1462-common.patch, MAPREDUCE-1462-mr.patch
>
>
> Although the current serializer framework is powerful, within the context of a job it
is limited to picking a single serializer for a given class. Additionally, Avro generic serialization
can make use of additional configuration/state such as the schema. (Most other serialization
frameworks including Writable, Jute/Record IO, Thrift, Avro Specific, and Protocol Buffers
only need the object's class name to deserialize the object.)
> With the goal of keeping the easy things easy and maintaining backwards compatibility,
we should be able to allow applications to use context specific (eg. map output key) serializers
in addition to the current type based ones that handle the majority of the cases. Furthermore,
we should be able to support serializer specific configuration/metadata in a type safe manor
without cluttering up the base API with a lot of new methods that will confuse new users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message