avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Mazak (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1699) AutoMap field values between Avro objects with different schemas
Date Mon, 13 Jul 2015 17:48:04 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Paul Mazak updated AVRO-1699:
    Affects Version/s: 1.7.6
               Status: Patch Available  (was: Open)

Attaching the AutoMap utility we wrote and have been using.

> AutoMap field values between Avro objects with different schemas
> ----------------------------------------------------------------
>                 Key: AVRO-1699
>                 URL: https://issues.apache.org/jira/browse/AVRO-1699
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.6
>            Reporter: Paul Mazak
> There are a few use cases for this:
> *Various Avro input data to one common output*
> You want to pickup Avro files in different schemas and normalize into one. You might
wish to transform to the superset of the input schemas.
> *Aggregating Raw Data*
> You want to rewrite data grouped by some fields and aggregated.  The output Avro in this
case would be a subset of the input Avro, where at least the group by fields are in both input
and output schemas.
> *Alternate Views*
> You have Avro data that you want to trim different ways to create subsets that would
be useful for views in Hive or exports for SQL tables.
> *Schema Migration*
> You've added fields to a schema and you are storing data in both the old and new schemas.
 You have Avro in an old schema and you can't process it with Avro in the new schema (using
pig or java map-reduce).  AutoMapping would up-convert your old data by setting null for the
new fields added, and all data are in the new schema.  This was [asked|http://stackoverflow.com/questions/27131942/is-it-possible-to-retrieve-schema-from-avro-data-and-use-them-in-mapreduce]
about on StackOverflow.
> _Considerations:_
>  * Loop over the source schema fields available to automap over and return any that were
unable to be mapped.
>  * Allow mappings between compatible types. For example going from integers to longs,
floats to strings, etc.
>  * Field names match case-sensitive.
>  * Make use of aliases in the schema when considering fields to automap.
>  * Deep copy nested structures like arrays and maps

This message was sent by Atlassian JIRA

View raw message