hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Boudnik (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Mon, 22 Nov 2010 22:14:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934624#action_12934624
] 

Konstantin Boudnik commented on HADOOP-6685:
--------------------------------------------

bq. However, in the case of a PB serialization, for example, the PB library is not used in
Hadoop except in the serialization code for serializing the user's data type. So it's a user-level
concern, and should be compiled as such - putting it in core Hadoop is asking for trouble
in the future, since the Hadoop releases won't keep track with the union of PB, Thrift, and
Avro releases. These serialization plugins should be stand alone, or at least easily re-compilable
in a way that doesn't involve recompiling all of Hadoop, such as a contrib module. The user
just treats the plugin JAR as another code dependency.

+1 on Tom's point: having a variety of serialisation frameworks in a product is a good thing.
Unless it doesn't come with the cost of possible mess they might cause if their public APIs
start deviating in a way that core Hadoop will have to be changed to keep user applications
working. Testing those is another thing: if Hadoop claims to support something explicitly
somebody needs to make an effort and guarantee that it is so.

Having a clean abstraction for serialisation and pluggable frameworks as a user wish sounds
like a reasonable compromise.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message