hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Mon, 29 Nov 2010 22:59:19 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964994#action_12964994
] 

Doug Cutting commented on HADOOP-6685:
--------------------------------------

Chris, thanks for clarifying.  Yes, I'd prefer to focus exclusively on (1) in this issue.

I don't yet see an advantage for opaque binary.  A common, transparent configuration data
format will simplify the creation of configuration editing tools.

I do see some advantages to using an object serialization system for configuration data: it
would provide stronger typing and simplify implementation.  But I don't yet see it as a clear
win.  A best practice for object bindings is generally to wrap a generated object in a wrapper
that can host convenience methods and support a back-compatible APIs.  This is not so different
in developer cost than the static methods we use today.

I don't yet see the urgency of this change.  Serialization metadata is not complex, nor does
it constitute much user code.  Neither of the issues that this feature is designed to support
 (MAPREDUCE-1462 and MAPREDUCE-1183) require binary data, since both have been implemented
without it.  We already have support for passing Properties-like data throughout the system.
 Adding a second channel for configuration data will complicate things.

As we consider replacing serialization metadata we should probably look for a solution that's
appropriate for replacing all configuration data.  The primary feature that seems currently
missing is configuration nesting.  Stronger typing would also be useful.   I can imagine a
JSON-based configuration system.  It would have advantages over XML, in that simple types
(string, number, boolean) are distinguished.  JSON can be bound to objects, providing stronger
typing yet, and this can be done through reflection which might avoid the need for a wrapper.
 It would support the implementation of configuration-editing tools.

In summary, it would be useful to be able to nest configuration data.  Nesting it in properties
is possible and does not disrupt existing APIs, but it's not very elegant.  Stronger typing
of configuration parameters would be nice.  It would be nice to keep metadata transparent.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message