hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Sat, 13 Nov 2010 00:04:22 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931588#action_12931588

Owen O'Malley commented on HADOOP-6685:

Owen, thanks for the slides

You're welcome. Everyone had seen them before, but I wanted to make sure they were easily
available for this conversation.

I don't see a direct relation between this issue and the issue of simplifying the implementation
of efficient map-side joins (MAPREDUCE-1183, more or less). Am I missing the connection, or
is this a distinct issue?

It is related because we want to support context-specific serializations. That support is
much easier if the metadata for each serialization is in a separate structure and not dumped
into the Configuration. This is the same problem that comes from MAPREDUCE-1183 for InputFormats,
Mappers, etc. They are similar issues and it would be nice to have a consistent solutions.

File formats are forever.

I'm adding no new file formats. I'm just making the ones that we've had for years have more

We badly need to add support for a higher-level object serialization system than Writable.

I obviously agree enough that I'm working on supporting it. Providing customer choice over
the serialization is much richer than forcing them into a single one. They each have different
design decisions, by making the choice pluggable the *user* can decide. I understand that
you want Avro everywhere. Other users have other priorities.

But I'm not convinced its wise to add such support to the exisiting Java-only container file

I'm supporting the containers we have. I'd love for someone to implement SequenceFiles or
TFiles in C. That is an orthogonal issue. Any file format that only supports one serialization
doesn't meet my needs.

This change should have no impact on any current applications. Very few of them depend on
the serialization library directly. My hope is that by extending the library and supporting
a wider range of serializations, users will be able to code their applications using the types
that *they* find convenient.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: serial.patch, SerializationAtSummit.pdf
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message