avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
Date Wed, 08 Feb 2012 02:37:00 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203181#comment-13203181

Scott Carey commented on AVRO-1006:

More notes:
* Schema equivalence has a few variations
** Serialization equivalent -- attribute metadata is irrelevant,  {"type":"int", "java-class":"java.lang.Short"}
is equal to {"int"}.  Defaults and doc are also irrelevant for this case.
** Serialization and metadata equivalence, where the above two are not equivalent.
** Reversible transformation equivalence, e.g. ["int", "string"] equals ["string", "int],
or records with pure field reordering.

* Other schema relationships that are related to equivalence but cannot satisfy associativity
and transitivity
** Alias equivalence is not transitive, but is associative.
** Schema resolution and transformation is often neither transitive or associative.

All three equivalence variations above may be useful for different purposes, especially the
first two.  Serialization equivalence is important for long term storage.  Full equivalence
with metadata is often needed for internal state.  But we may want to let users specify which
optional components are included (attributes, defaults, doc).  Doug's point about JSON being
an unordered format is important and limits using the json string as the fingerprint.
Perhaps we can complete the Avro Schema for schemas (AVRO-251) which can define field order
and equivalence unambiguously and all implementations should be able to support.  The output
bytes from the Avro binary serialization of the schema can be used to feed a hash algorithm.

> Fingerprints for Avro Schemas
> -----------------------------
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html, schema-fingerprinting.html, schema-fingerprinting.html
> Add function that returns a standardized, 64-bit fingerprint for schemas.  Fingerprints
are designed such that the chances of collisions is very, very low.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message