avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "graham sanderson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
Date Sun, 26 Feb 2012 04:48:49 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216629#comment-13216629

graham sanderson commented on AVRO-1006:

"A clarification, which addresses issues raised by Doug and Scott. The need I'm solving for
is to capture that part of a writer's schema which a reader needs to read data. This is a
relatively straight-forward notion of "equivalence," and a very useful one. And the good news
is that this notion of equivalence allows us to ignore many aspects of schemas (e.g., attributes,
aliases, default values)."

Perhaps this should be made clearer (when naming the class/method), I came across this feature
because of a desire to hash/fingerprint avro schemas for messaging, and was seeing if there
was already a util to do it. In my case I potentially might use custom properties on fields
in the schema to indicate they are being transmitted using a certain named dictionary and
thus in my case they affect the ability to interpret the message, so I'd rather stick with
something that I can reliably use on the producer end to encode the entire state of the schema,
rather than a particular well defined sub-set of the schema.

Note that (thanks to someone making Props a LinkedHashMap since the code base I'm using) and
the particular implementation of Jackson, schema.toString() in the Java impl appears like
it will be fine for my purposes, and if another language implementation happens to produce
a different hash value I'm cool with that, as long as it is relatively stable; for example:

SchemaInstance1 -toJson-> string x
string x -fromJson-> SchemInstance2 -> toJson string y
string x and string y being equal seems a reasonable enough guarantee for me
> Fingerprints for Avro Schemas
> -----------------------------
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, AVRO-1006.patch, schema-fingerprinting.html,
schema-fingerprinting.html, schema-fingerprinting.html
> Add function that returns a standardized, 64-bit fingerprint for schemas.  Fingerprints
are designed such that the chances of collisions is very, very low.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message