hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Tue, 30 Nov 2010 06:43:23 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965119#action_12965119
] 

Owen O'Malley commented on HADOOP-6685:
---------------------------------------

{quote}
I don't yet see an advantage for opaque binary. 
{quote}

The two different solutions that have been proposed for serialization metadata  are:
* string to string map (HADOOP-6165, HADOOP-6120, HADOOP-6323, HADOOP-6443, MAPREDUCE-1126)
** type unsafe - users may put the wrong type of value into a slot
** unchecked keys - users may misspell a key and get the default value by mistake
** complete visibility - details of implementation are completely visible to user and impossible
to change
* opaque blob (HADOOP-6685)
** may be encoded as binary or text
** given a versioned format (ProtoBuf, JSON, Thrift, XML), is completely extensible
** since interface is via API
*** it is type-safe
*** all of the setters and getters are checked for validity by the compiler
*** can specify the visibility to the user
*** it can be easily documented via javadoc

In both cases, the metadata is specific to the serialization and can't be interpreted without
reference to the corresponding serialization.

{quote}
A common, transparent configuration data format will simplify the creation of configuration
editing tools.
{quote}

Since the metadata is specific to each serialization, there are no common interfaces to support
those editing tools. So the string to string maps give the appearance of a common format,
but without the semantics it isn't possible to write tools to edit it. In both implementations,
it is easy to write dumpers.

{quote}
As we consider replacing serialization metadata we should probably look for a solution that's
appropriate for replacing all configuration data
{quote}

When MAPREDUCE-1183 is done, there is very little need for the string to string map of Configuration.
We will have it for a long time to support old applications, but users won't need it.

It would be nice to move the servers away from the current XML encoded string to string configurations,
but that is *way* outside the scope of this jira.


> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message