hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Mon, 08 Oct 2007 02:21:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533022

Owen O'Malley commented on HADOOP-1986:

   No one was suggesting a serializer per a concrete class, except in the case of Thrift if
they don't implement a generic interface. Your proposal doesn't address how the mapping from
an Object to Serializer is managed. I think my suggestion provides the most flexability since
you only need one serializer per a root class and they don't have any requirements on the
implementation classes at all. Basically, each serialization library that someone wanted to
use with Hadoop would have a single generic serializaer and a library routine would do the
lookups at the first level:

public interface Serializer<T> {
  void serialize(T t, OutputStream out) throws IOException;
  void deserialize(T t, InputStream in) throws IOException;
  // Get the base class that this serializer will work on
  Class<T> getTargetClass();

org.apache.hadoop.io.serializer.WritableSerializer would be coded to read and write any Writable,
while org.apache.hadoop.io.serializer.ThriftSerializer would read and write any Thrift type.

I'd probably make a utility class:

class org.apache.hadoop.io.serializer.SerializerFactory extends Configured {
  Serializer<T> getSerializer(Class<? extends T> cls);

and presumably the SerializerFactory would include a cache from the class to serializer class
(hopefully with weak references to allow garbage collection). This would allow you to remove
all references to Writable in SequenceFile and the map/reduce classes. Any object could be
written into sequence files or passed around in map/reduce jobs. It would be cool and should
result in only a modest amount of confusion to the users. 

Furthermore, since it makes only relatively minor use of reflection, a C++ implementation
along similar lines should be feasible. (Although it would be a lot more expensive to evaluate,
because dynamic_cast is outrageously expensive because of the C++ multiple inheritance semantics.)

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>             Fix For: 0.16.0
>         Attachments: SerializableWritable.java
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message