hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Tue, 06 Nov 2007 05:26:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540333

Vivek Ratan commented on HADOOP-1986:

>>Above we agreed that "stateful" serializers could not buffer, since we might wish
to put raw binary values between serialized objects (as SequenceFile does). Do you dispute
No. I agree hat serializers should not buffer. But serializer instances can share output streams
or other objects, and that's what I meant by 'state'. 

It seems to me that what you're saying is that if you want a serialization platform X to work
with Hadoop, X should do two things, at least:
- X should allow creation of multiple instances of its serializer (so, for example, if X's
serializer instances share anything, like library handles or stream objects, they have to
deal any issues that arise from this sharing, such as initializing or destroying these shared
instances etc; i.e., X is responsible for all this)
- X needs to be able to *both* create objects before deserializing them (i.e., those objects
should have no-arg constructors, or should be all constructed in a common manner) *and* take
in a reference to an object and initialize its member variables with deserialized data. 
If X follows these, then we get client code that is generic and does not have to 'replicate
logic', as you say. Correct? 

I'm all in favor of client code not replicating framework logic. It's definitely an important
requirement. But I see it as coming with a price: the two constraints above that X must follow.
Now, Thrift or Record I/O shouldn't have any problems with these constraints, which is quite
important to know. But the constraints are non-trivial enough that some other platform might
not be able to satisfy them. Unfortunately, I do not have a concrete example of such a platform.
At the same time, I can realistically imagine a platform that does not force its de/serializable
objects to have no-arg constructors (because that can be a severe restriction in the design
of an object), and requires the caller to pass in an object reference (much like Java Serialization,
but without the Java platform having to create the objects when deserializing). But yes, these
are somewhat hypothetical arguments. I also understand that we should perhaps favor design
that supports existing serialization platforms and not make it too general if there's a price.

At this point, I think it's a gut call. If we feel that having clients not replicate platform
logic is more important than the restrictions we're providing on serialization platforms,
that's fine. I can certainly see the validity of that, and can't argue strongly against it.
I lean (slightly) more towards the other side, but I don't have concrete examples to lean
too far. 

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>         Attachments: SerializableWritable.java, serializer-v1.patch
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message