hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Mon, 22 Nov 2010 22:20:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934627#action_12934627
] 

Steve Loughran commented on HADOOP-6685:
----------------------------------------

I've been doing lots of JSON work recently, net.sf.jsonobject, gson, jackson, etc: so many
parsers, so many others dependencies. Any of those that try to reimplement the dream of WS-*
(seamless serialisation between native objects) is repeating the same mistakes. But it's good
for shoving stuff around, serving up over HTTP, parsing in different languages. Compared to
XML, the fact that Xerces ships on all Hadoop-compatible JVMs, gives XML an edge, one that
DOM takes away in its pain of use. I'm =0 on it internally. Less painful than XML, but the
extra dependencies and time I waste converting from different java models of the graph hurts.
And like XML, you end up escaping and base-64-ing stuff. 

The ASF would veto any release of Hadoop that depended on an unreleased in-incubation artifact.
This would complicate any plan to branch to 0.22, or at least release it, unless the build
file was set up to exclude thrift-specific code. But if HDFS already depends that, that's
something in the schedule plans anyway, and Hadoop core + hdfs will depend on a specific thrift
version.

+1 to tom's suggestion of keeping PB off in a contrib package, the same for thrift if HDFS
can remove its dependencies. 

=0 to binary config data vs map<string, string>. Binary is efficient but brittle, map
easier to debug. Question is, what would the performance cost of staying in string maps be?



> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message