avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: (Default) values for logical types in human-readable form
Date Tue, 17 Oct 2017 15:57:35 GMT
I think that the parsing canonical form of a schema
<https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas>
doesn't include the default. I think that makes sense because the canonical
form is what's needed to read encoded data. Anyone with more context: is
that correct?

In my opinion, that makes how we handle defaults a bit more flexible
because schemas with different defaults are "the same". I'd support adding
a new default field that handles values more naturally. We've always had a
problem with binary as well and I'd like to see us use base64 encoded
values instead of the current strategy.

rb

On Tue, Oct 17, 2017 at 8:16 AM, Zoltan Ivanfi <zi@cloudera.com> wrote:

> Hi,
>
> I would like to start a discussion about making default values and values
> in general human-readable for logical types.
>
> Currently default values for logical types have to be specified in a JSON
> string as the binary representation of the backing primary type (e.g.,
> "\u0000"). Some users intuitively try to specify a human-readable logical
> value in this string instead (e.g., "0.00"). This is of course a valid byte
> sequence and as such is accepted, but it results in unexpected behaviour (a
> different default value than intended). Apart from being error prone,
> specifying default values this way is also tedious. To keep this e-mail
> brief, I won't list specific examples here, please see AVRO-2087
> <https://issues.apache.org/jira/browse/AVRO-2087> for details instead.
>
> The problem of non-human-readable values applies to JSON encoding of actual
> data as well. One reason for using JSON is that it is human readable and
> therefore easy to debug. Seeing "\u00018" in a JSON file is not too
> intuitive and this specific example is actually quite misleading as well
> (it can be easily misread as "\u0018").
>
> Introducing a new default value field (called human-readable-default or
> logical-default for example) would allow easier specification of default
> values. (It doesn't solve the problem of accidentally misusing the existing
> field though.) It is, however, not backwards compatible. An older Avro
> library would ignore the new field and use a different default value.
>
> Introducing human-readable values in the JSON encoding is even more clearly
> a breaking change. (Although for JSON we could add the human-readable value
> as a separate extra field that gets ignored when reading. Problem is, users
> may be tempted to change the value and be surprised. It's a pity that JSON
> does not allow comments.)
>
> In your opinions, what would be the best way to deal with this problem?
>
> Thanks,
>
> Zoltan
>



-- 
Ryan Blue
Software Engineer
Netflix

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message