hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Thu, 11 Nov 2010 22:48:22 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley updated HADOOP-6685:
----------------------------------

    Attachment: serial.patch

Ok, here is a preliminary patch. 

It includes support for Avro, Thrift, ProtocolBuffers, Writables, Java serialization, and
an adaptor for the old style serializations. One of the features of the Avro serialization
is that the kind ("reflection", "specific", "generic") is a parameter that can be changed
between writing and reading the file.

All of the types can be put into SequenceFiles, MapFiles, BloomFilterMapFiles, SetFile, and
ArrayFile.

In a separate issue, I'll upload the OFile wrapper that goes on top of TFile to allow all
of the types into TFiles as well.

It creates a new package o.a.h.io.serial that defines the new interfaces. The new serializations
save their metadata in a framework specific format. To make the format extensible, I've use
protocol buffers to encode this information. This will allow us to make arbitrary compatible
extensions later.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: serial.patch
>
>
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message