hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Wed, 10 Oct 2007 20:55:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533874

Doug Cutting commented on HADOOP-1986:

> No one was suggesting a serializer per a concrete class [ ... ]

Actually, I thought we might, and would like to preserve that option.  I worry about performance
of introspection.  And, for very simple objects, the overhead of having WritableSerializer#serialize(o,out)
call o.write(out) rather than just being o.write(out) may even be significant.  Or it may
not be.  In any case, if record code is generated from a DDL, then we can implement this either
way, with per-class serializers or per-baseclass serializers.  If we discard the DDL and code-generation,
then we're stuck with introspection, no?

I wonder if we might permit both by having the configuration name not serializers but serializer
factories.  So one could specify the availability of a WritableSerializerFactory that would
be constructed, cached and used to construct serializers for Writables.  That could then potentially
return a different serializer for each kind of Writable, or the same serializer for all Writables.

Finally, if we keep the DDL and generate only the class, not its serializers, then there could
theoretically be compatibility issues with other languages.  If, for example, the DDL defines
different types that map to the same type in Java (short versus character?) then using introspection
could cause problems.  This is improbable, but another thing to watch out for.

Do I worry too much?

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>         Attachments: SerializableWritable.java
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message