hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce
Date Wed, 10 Oct 2007 16:40:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533789
] 

Owen O'Malley commented on HADOOP-1986:
---------------------------------------

*laugh*

Option 1 is *precisely* what I was proposing, except that you keep blurring how this part
works: 
{quote}
we simply obtain the right serializer object (RecordIOSerializer or ThriftSerializer or whatever)
using a factory or through a configuration file
{quote}
My proposal spells out how that happens by saying that the configuration has a map between
root classes and the corresponding serializer class. When the factory is given an object,
it consults the map and constructs the correct serializer.  Clearly the factory will cache
the information so that it doesn't have to traverse the class hierarchy for constructing each
serializer.

It just simplifies things a bit if the serializer class specifies which class they work on
instead so that instead of configuring a value like:

{code}
org.apache.hadoop.io.Writable->org.apache.hadoop.io.WritableSeraizlier 
{code}

we can just provide a list of serializers the factory can automatically determine what they
apply to. Now that I think about it, just having

{code}
interface Serializer<T> {
   void serialize(T t, OutputStream out) throws IOException;
  void deserialize(T t, InputStream in)  
}
{code}

because we can use reflection to find the value of T for a given factory class. So, the serializers
would just look like:

{code}
class ThriftSerializer implements Serializer<ThriftRecord> {
   void serialize(ThriftRecord t, OutputStream out) throws IOException {...}
  void deserialize(ThriftRecord t, InputStream in)  throws IOException {...}
}

class WritableSerializer implements Serializer<Writable> {
   void serialize(Writable t, OutputStream out) throws IOException {...}
  void deserialize(Writable t, InputStream in)  throws IOException {...}
}
{code}

and in the config put:

{code}
<property>
  <name>hadoop.serializers</name>
  <value>org.apache.hadoop.io.WritableSerializer,com.facebook.hadoop.ThriftSerializer</value>
  <description>The list of serializers available to Hadoop</description>
</property>
{code}


> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs.
While it's possible to write Writable wrappers for other serialization frameworks (such as
Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types
directly, without explicit wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message