hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Fri, 19 Oct 2007 19:55:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536329

Doug Cutting commented on HADOOP-1986:

> Except there might not be enough type information to construct an object.

The factory can keep that around.  So, if deserializer depends on the type of the instance
passed in, then the deserializer your factory builds should include the class and create an
instance of it when the instance is null.  Java Serialization would not need to do this, but
Thrift would.  I'm trying to avoid client code that differs depending on the serializer.

> So unless there's another way of getting round this then I think we're stuck with stateful

Back to my previous comment: with a stateful serializer, is it permitted to intersperse other
i/o on the stream that's passed to the serializer, or must the serializer's input or output
be reset each time this is done?  If it must be reset, then I see little point to this optimization,
as writing raw data between serialized data is common (e.g., SequenceFile writes record lenghts,
RPC writes request numbers, etc.).

A related issue is synchronization.  If one calls saves a file position, calls serialize(),
can one seek back to that position and call deserialize()?  Java Serialization does not generally
permit this.  Should we add a sync() method that must be called whenever you wish to have
a point in the stream that you can seek to?  Or would you use open/setOutput for this too?

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>         Attachments: SerializableWritable.java, serializer-v1.patch
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message