avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@gmail.com>
Subject Re: Parsing canonical forms with schemas having default values.
Date Wed, 07 Jun 2017 18:44:39 GMT
When reading data, two schemas are used: a schema with the same
fingerprint as used to write the data, typically the actual schema
used to write, and the schema you'd like to project to.  Default
values are only used from the latter schema.

Matching fingerprints indicate binary compatibility.  Schema
resolution allows evolution to a schema with a different binary
format, i.e., with additional fields that specify a default value.

Schema compatibility through resolution cannot be represented in a
single number like a fingerprint.


On Tue, Jun 6, 2017 at 11:41 AM, Satish Duggana
<satish.duggana@gmail.com> wrote:
> https://avro.apache.org/docs/1.8.1/spec.html#Parsing+Canonical+Form+for+Schemas
>> Parsing Canonical Form is a transformation of a writer's schema that let's
>> us define what it means for two schemas to be "the same" for the purpose of
>> reading data written agains the schema. It is called Parsing Canonical Form
>> because the transformations strip away parts of the schema, like "doc"
>> attributes, that are irrelevant to readers trying to parse incoming data. It
>> is called Canonical Form because the transformations normalize the JSON text
>> (such as the order of attributes) in a way that eliminates unimportant
>> differences between schemas. If the Parsing Canonical Forms of two different
>> schemas are textually equal, then those schemas are "the same" as far as any
>> reader is concerned, i.e., there is no serialized data that would allow a
>> reader to distinguish data generated by a writer using one of the original
>> schemas from data generated by a writing using the other original schema.
>> (We sketch a proof of this property in a companion document.)
> Currently, it keeps only attributes of type, name, fields, symbols, items,
> values, size and strips all others including default attribute.
> Should not default attribute also be kept? Because schema with default value
> and without default value are not canonically same with respect to schema
> evolution.
> Thanks,
> Satish.

View raw message