avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Jeske <dav...@gmail.com>
Subject questions about sort-orders
Date Thu, 02 Dec 2010 15:30:00 GMT
I like the inclusion of sort-order in avro, to enable different machines to
sort and exchange. I have a few suggestions to clarify the documentation.
Please correct any assumptions I've made that are incorrect...

It seems that sorts are not stable across schema versions. I think I
understand why this makes sense inside the schema philosophy, yet I think
the documentation could clear up a couple of the subtlties a bit more. For
example, it says "*data items may only be compared if they have identical
schemas*". If I supply a source schema which avro can map into my target
schema, I would think it could load and compare things in my target schema.
Is this correct? It might be clarified.

Also, the comment "*this permits data written by one system to be
efficiently sorted by another system*", could callout that data items sorted
in one schema may not be in the proper order if during read they are mapped
to a new version of the schema. In fact, it might be useful for Avro to be
able to tell me when it does the source->target schema mapping, whether both
schemas sorted in the same order (if it doesn't already).

Lastly, it says "*Note also that Avro binary-encoded data can be efficiently
ordered without deserializing it to objects.*" What does this mean exactly?
 This might be mis-interpreted as saying one can lexicographically sort the
binary-encoding without asking Avro to deserialize it, and it'll be in a
proper order. However, this seems obviously not true from the number
formats. Perhaps it would be clearer to say "Avro can efficiently make
sort-comparisons on binary-encoded data without allocating deserialization

Did I properly understand those sort-related subtlties?

View raw message