avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: clarification on schema resolution
Date Mon, 16 Nov 2009 19:24:49 GMT
Scott Banachowski wrote:
> Regarding union resolution, the spec says to take the first match when
> comparing them.  What if the first match is "promotable"?  I would expect
> preference for the first "exact" match, if both an exact and promotable
> match existed.

Good question.  I agree, and that's what the Java implementation does, 
in GenericDatumReader#resolveExpected().  It first looks for a branch 
which "exactly" matches the type written, where "exactly" means, for 
named types, has the same name, and for other types has the same base 
type, e.g., array, map, int.  If exact match fails, then it looks for a 
match via promotion.

Should we fix the spec to better document this?

> Second clarification is about usage of unset vs. signaling error:  in the
> rules if we cannot match enums, we declare it unset, but if we cannot match
> unions, we signal error.  It seems for one type we take a best effort
> approach and in the other we give up.  I'm just trying to understand what is
> different about the two cases.

I think it's reasonable to differ in how we handle unmatched field names 
and mismatched types.  If the type for a field changes in a way that 
cannot be promoted, that's an incompatible schema.  But if a field is 
renamed, then we ignore the value under the old name and might leave the 
new name unset.  Union mismatches are type differences, not field name 
differences, so errors seem appropriate.  Does that make sense?

There's an open issue about whether leaving fields unset is reasonable.


What do folks think is best here?  Should we require all values that 
don't have a default values to be present, i.e., record fields with a 
default value are optional, all others are mandatory?

> And assuming we're doing this matching at
> runtime, for example, reading an rpc message, does signaling error mean we
> drop the message, or still pass parts of the message that could be resolved?

If we adopt the "unset" concept, then implementations could pass the 
parts of the message that can be resolved.  Note that we should never 
simply drop the message, but rather return an error response, but that's 
probably what you meant.


View raw message