avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svante Karlsson <svante.karls...@csi.se>
Subject Re: Using Avro for encoding messages
Date Thu, 09 Jul 2015 12:20:14 GMT
>> What causes the schema normalization to be incomplete?
Bad implementation, I use C++ avro and it's not complete and not very
active.

>And is that a problem? As long as the reader can get the schema, it
shouldn't matter that there are duplicates – as long as the >differences
between the duplicates do not affect decoding.
Not really a problem, we tend to use machine generated schemas and they are
always identical.

I think there are holes in the simplification of types if I remember
correctly.
Namespaces should be collapsed,
{"type" : "string"} -> "string" etc

Current implementation can't reliably decide if two types are identical. If
you correct the problem later then a registered schema would actually
change it's hash since it now can be simplified. If this is a problem
depends on your application.

We currently encode this as you suggest <schema_type (byte)><schema_id
(32/128bits)><avro (binary)>
The binary fields should probably have a defined endianness also.

I agree on that a defacto way of encoding this would be nice. Currently I
would say that the confluent / linkedin way is the normal....

Mime
View raw message