hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Tue, 16 Nov 2010 10:01:17 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932396#action_12932396
] 

Tom White commented on HADOOP-6685:
-----------------------------------

Here's my feedback on the patch:

# I think the new serializations should be optional dependencies. Mandating a particular version
of Thrift, Protocol Buffers, and Avro is going to cause problems for folks down the line,
since we would be tying the version to Hadoop's release cycle, which is infrequent. By making
the serializations libraries (or contrib modules, as in MAPREDUCE-376, MAPREDUCE-377) makes
them independent, and will make it easier to support the version of the serialization library
the user wants.
# I preferred the version where the Serialization could choose the way it serialized itself.
In the current patch, if you wrote Avro data in a SequenceFile you would have Writables for
the file container, and a PB-encoded Avro schema for the serialization. Having so many serialization
mechanisms is potentially brittle.
# I'm not sure we need the full generally of PB for serializing serializations. If the serialization
could choose its self-serialization mechanism, then TypedSerialization could just write its
type using as a Text object. Doing this would remove the core dependency on PB, and allow
1.


> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message