hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Wed, 24 Oct 2007 12:03:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537294

Vivek Ratan commented on HADOOP-1986:

>> The factory can keep that around. So, if deserializer depends on the type of the
instance passed in, then the deserializer your factory builds should include the class and
create an instance of it when the instance is null. Java Serialization would not need to do
this, but Thrift would. I'm trying to avoid client code that differs depending on the serializer.

This might be difficult if you have a serializer that can handle lots of classes. Take the
example of Record I/O. Every class that can be serialized, inherits from Record. There is
only one serializer, that for Record I/O, but it can handle any Record class (and there're
an infinite number of such classes). You may want to create a singleton Record I/O serializer
to handle more than one class that inherits from Record, and it won't know which class to
deserialize (or, it will have to handle a huge amount of classes). I understand that you're
trying to avoid extra client code, but you may end up unnecessary complicating the platform
code. Furthermore, conceptually you do want the client to distinguish between serializers
that create objects and those that expect the client to create them. This is not so relevant
in Java, with its memory management, but for other languages, you do want to make it explicit
as to who is responsible for memory management. 

Serializers that create their own objects and pass them back to the client are, in many ways,
fundamentally different from those that expect clients to pass in an object to deserialize.
The former expect deserialized objects to have a constructor with no parameters, and the objects
are quite simple wrappers around data. In the latter case, the objects are usually much more
than simple wrappers around member variables and their constructors can be quite complicated.
I guess what I'm saying here is that these two types of serializers are different enough,
and that you will rarely, if ever, see a serializer that supports both, that you don't want
to hide that difference in your common serializer interface. I think a client will either
always pass objects that it constructs itself, or get back new objects from the serializer;
I don't think it will mix these calls up with the same serializer. So I think it's fine, and
desirable, for clients to explicitly make different calls to the two types of serializers.
In fact, it would seem likely that most clients will be written explicitly for one of these
two kinds of serializers, given that a client will likely use the same platform for serialization
and deserialization. 

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>         Attachments: SerializableWritable.java, serializer-v1.patch
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message