avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Kahn <troc...@trochee.net>
Subject Re: Record sort order is "lexicographically by field" -- what does that mean?
Date Thu, 28 Mar 2013 17:57:54 GMT
Thanks for the information, Harsh. Further comments inline below:

On Thu, Mar 28, 2013 at 4:01 AM, Harsh J <harsh@cloudera.com> wrote:

> On Thu, Mar 28, 2013 at 5:15 AM, Jeremy Kahn <jeremy@trochee.net> wrote:
> > I can read "ordered lexicographically by field" in two ways:
> >
> > 1. the names of the fields are sorted lexicographically, and the field
> that
> > goes lexicographically first (not marked as "order":"ignore") dominates.
> >
> > 2. the records are sorted by the sort order of each field, with the first
> > fields (not marked "order": "ignore") taking sort priority.
> The second one is correct. The field's order in the defined schema is
> not changed but only walked through.
> [...] that's true from my use of it in Hadoop MR as well.

Okay, this is very helpful to know: it's working the way I had hoped.

> > Behavior (2) -- relative to behavior (1) -- offers the ability to adjust
> the
> > order of the schema to express a different sort order, but might present
> > problems for schema negotiation.
> What kind of problems are you describing here? Sorry if I'm not
> getting it by the words "schema negotiation" alone.

Suppose I sort a sequence of ZooInventory objects by the sort order implied
by this schema, and I send them to you in sorted order over a protocol with
an IDL type specification of array<ZooInventory>.  You *read* the sequence
with a different ZooInventory schema with the same fields, but which
contains a different ordering. The objects in the array will not
(necessarily) appear to be sorted *to you*.

This isn't necessarily a problem -- it might actually be a feature. It is
worth noting that two schemas may be compatible under schema negotiation
but have different sort order for reader and writer.


View raw message