hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Holmes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Wed, 24 Nov 2010 23:53:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935580#action_12935580
] 

Ryan Holmes commented on HADOOP-6685:
-------------------------------------

bq. Avro is already a dependency. Thrift is already a dependency for HDFS (see HDFS-1484).
I'm only adding ProtocolBuffers, which is a commonly used serialization format that many users
including me find extremely useful.
This line of reasoning is overly general and could be used to support the addition of literally
any dependency (i.e. dependency x already exists, so it's OK to add y). 

Hadoop should focus on providing a pluggable API for serialization rather than providing specific
internal implementations (optional implementations would be fine).  I also think Hadoop will
benefit greatly in the long term by promoting a single, default serialization and file format
for new users. I was under the impression that this was a shared goal and that the chosen
format was Avro. Adding a direct dependency on Protocol Buffers and increasing the scope of
dependency on Thrift seems to directly contradict that goal.

bq. In MAPREDUCE-980, you took out the custom JSON parser and replaced it with calls into
Avro. Using ProtoBuf is efficient and meant that I wrote 2 lines of code. If I used JSON,
I would need to write a parser and printer.
Can't you use Jackson, which is already a dependency? 


> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message