hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Tue, 07 Dec 2010 21:59:10 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969031#action_12969031
] 

Allen Wittenauer commented on HADOOP-6685:
------------------------------------------

> works against integrating external configuration systems with existing components

 I'm thinking of when we are past the limitations of the existing components.  What if we
don't pass files around for configuration information at all?  Then does making sure that
everything can be represented as a UTF-16 string make sense?  I don't think it does.

> Do we have much binary configuration data?

Given that it is currently impossible, the answer is obviously no.

But this seems like a major flaw of the existing system.  Who are we to dictate what the user
can/can't put in what is essentially a private part of the configuration name space?  Hadoop
as a framework shouldn't care what the representation of that value is if it doesn't have
to read it.  If I want to build a mass documentation signing system and provide the binary
representation of the CA cert as a configuration option to my serializer, why shouldn't I
be able to do that?  If I want to work in UTF-32 and pass information as a config option to
my serializer, why shouldn't I be able to do that?

Now one could argue that I could base64 encode my data or do the wacky !!binary thing that
YAML does  (JSON doesn't support binary, so to me, that instantly eliminates it.  Even crusty
x.500 supports binary!  ... and XML... well, you all know how I feel about it. *smile*). 
But why should I take a performance hit  to support my use case?

I don't see the value in support the existing system when it has what I would say is a major
flaw.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: serial.patch, serial4.patch, serial6.patch, serial7.patch, serial9.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message