hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Fri, 03 Dec 2010 19:56:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966647#action_12966647
] 

Owen O'Malley commented on HADOOP-6685:
---------------------------------------

{quote}
If we used JSON as the standard string representation, then nesting it within other JSON would
require no escaping.
{quote}

We don't have JSON file formats except for JobHistory. Locking the API to require JSON is
a huge and unreasonable cost for very little gain.

{quote}
What important capability do we lose by using a uniform textual format for configuration data?
{quote}

We don't have one right now. We have XML and JSON. Neither is user-friendly. I think it would
be extremely premature to lock JSON in at this level.

If you'll unveto my patch, I'll add the fromString to the interface and implement it. Since
the goal of the text is to be user-friendly, I'll probably use YAML instead of JSON. If you
think having plugins that use JSON is critical, you can write plugins that do so.

YAML is a more human-friendly form of JSON and the Writable metadata would look like:

{code}
{class: org.apache.hadoop.io.IntWritable}
{code}

and the Avro serialization metadata would look like:

{code}
{kind: SPECIFIC, schema: '{"type":"record","name":"AvroKey","namespace":"org.apache.hadoop.io","fields":[{"name":"value","type":"int"}]}'}
{code}



> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message