hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Tue, 30 Oct 2007 18:37:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538882

Doug Cutting commented on HADOOP-1986:

> I said that wouldn't work because you will likely want a singleton deserializer object
to handle deserializing more than one class [...]

I was with you to that point.  Why must you have a singleton serializer instance that handles
more than one class?  If the deserializer does not need to know the class (e.g., Java serialization)
then a singleton factory can be used.  But if the deserializer does need to know the class,
either to create an instance or for deserialization itself, then a different factory instance
would need to be created per class.  These could be cached by the framework, so no per-deserialized-object
allocations need happen.  The client (e.g., SequenceFile) can reuse serializers, so they need
not be allocated per object either.

> But it adds to my argument that you want to have separate deserialize methods and let
the client call the right one.

So would clients like SequenceFile and the mapreduce shuffle require different code to deserialize
different classes?  We need to have generic client code.

> Again, my point is that deserializers for Thrift and Record I/O cannot create objects
themselves and will always require the client to pass in the object [...]

Again, I don't see why Record I/O, where we control the code generation from an IDL, cannot
generate a no-arg ctor.  Similarly for Thrift.  The ctor does not have to be public.  We already
bypass protections when we create instances.

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>         Attachments: SerializableWritable.java, serializer-v1.patch
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message