avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-600) add support for type and field name aliases
Date Thu, 22 Jul 2010 20:56:50 GMT

    [ https://issues.apache.org/jira/browse/AVRO-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891325#action_12891325
] 

Doug Cutting commented on AVRO-600:
-----------------------------------

An example:

Data written with:

{code}
{"type": "record", "name": "org.x.Foo", "fields": [
    {"name": "a", "type": "int"},
    {"name": "b", "type": "int"}
  ]
}
{code}

Could be read with:
{code}
{"type": "record", "name": "org.y.Bar", "fields": [
    {"name": "c", "type": "int", "aliases": ["a"]},
    {"name": "d", "type": "int", "default": 0}

  ],
 "aliases": ["org.x.Foo"]
}
{code}

It would be an error for a type alias to name an already-defined type or for a field alias
to name an already-defined field.

The semantics would be equivalent to rewriting the writer's schema, replacing matching aliased
types and fields with their names in the reader's schema.  In the above example, the writer's
schema would be rewritten as:


{code}
{"type": "record", "name": "org.y.Bar", "fields": [
    {"name": "c", "type": "int"},
    {"name": "b", "type": "int"}
  ]
}
{code}

When instances are read, values for "a" would be read into the "c" field, values for "b" would
be dropped, and "d" would have the default value of zero.

> add support for type and field name aliases
> -------------------------------------------
>
>                 Key: AVRO-600
>                 URL: https://issues.apache.org/jira/browse/AVRO-600
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>
> It would be good if Avro would permit one to still read data if a type or field name
has been changed.  I propose we add a notion of name _aliases_.  Aliases could be listed for
every named type and for record fields.  The writers schema would be permitted to contain
any of the aliases.
> In general, this permits one to construct schemas that can read different types into
a single type.  One could use this not just to handle renamings, but also to join different
datasets.  For example, if two datasets each contain differently named records with a date
and an ip address field, this could be used be used to project these both to a single record
with just those fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message