avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bridger Howell <bhow...@sofi.org>
Subject Re: (Default) values for logical types in human-readable form
Date Thu, 19 Oct 2017 05:16:08 GMT
> I don't think we can change the behavior of the "default" key. Otherwise, older
readers would use the wrong value.

This is true, but the "human-readable default" feature is inherently
incompatible with older readers. My hope was that giving an invalid type
for the default would cause an error when older readers try to parse it,
but that's not the case and you're right. There would still always be an
issue with specially crafted record types.

> I suggest that we add an optional key, like "default-as-string", that is used
to fill in a missing "default" key if there is a reasonable conversion.

So then if an older reader reads a schema field with "default-as-string"
used instead of "default", it will decide that field has no default? I
don't really like that, but it's better than using the wrong value (e.g.
"default" + "default-parser") or erroring on most data reads (changing the
"default" field to an object). I don't think we can make old readers fail
properly, since they would have to already have the future knowledge that
there is supposed to be a default value. Someone correct me if I'm wrong on
this. (Generically it should be possible if we included schema spec
versions in schemas.)

What would be your criteria for there being a reasonable conversion? Field
type and logical type?

> On write, the write schema would convert to the normal "default" field
for backward-compatibility.

Good idea - this should be generically possible no matter how
human-readable defaults are implemented in the spec.

> On read, you can supply only the string default to use that instead of
the binary one. I think we could take care of this entirely in the schema
parser.

On the same page here.

- Bridger Howell

On Wed, Oct 18, 2017 at 9:56 AM, Ryan Blue <rblue@netflix.com.invalid>
wrote:

> I don't think we can change the behavior of the "default" key. Otherwise,
> older readers would use the wrong value.
>
> I suggest that we add an optional key, like "default-as-string", that is
> used to fill in a missing "default" key if there is a reasonable
> conversion. On write, the write schema would convert to the normal
> "default" field for backward-compatibility. On read, you can supply only
> the string default to use that instead of the binary one. I think we could
> take care of this entirely in the schema parser.
>
> rb
>
> On Tue, Oct 17, 2017 at 11:53 PM, Bridger Howell <bhowell@sofi.org> wrote:
>
> > I really like the idea of having support for human-readable default
> values.
> >
> > I think I prefer to keep the way defaults are interpreted separate from
> > logical types, since logical types having are basically optional. I would
> > be surprised if my language of choice could understand an ISO-8601
> > formatted local-date for a field default based on logical type, but I
> still
> > had to interface with a numeric value in my code.
> >
> > If this doesn't conflict too much with the default value for record
> fields
> > (?), I would suggest having an object syntax with a "parser" or "type"
> > field in addition to the default property.
> >
> > A sample record:
> > {
> >   "type": "record",
> >   "name": "Foo",
> >   "fields": [
> >     {
> >       "name: "body",
> >       "type": "bytes",
> >       "default": {
> >         "value": "aGVsbG8gd29ybGQ",
> >         "parser": "base64",
> >         "doc": "'hello world' as a base64-encoded string"
> >       }
> >   ]
> > }
> >
> > If changing the "default" property like that has too many issues, I
> suppose
> > a parallel "default-parser" property would do the trick too.
> >
> > I think this type of approach keeps us neatly separated from logical
> types,
> > so that having a parser for a default value doesn't require a logical
> type,
> > and maybe makes it clearer which procedure is being performed on the JSON
> > data to convert it to the base field type.
> >
> > -Bridger Howell
> >
> > On Tue, Oct 17, 2017 at 9:57 AM, Ryan Blue <rblue@netflix.com.invalid>
> > wrote:
> >
> > > I think that the parsing canonical form of a schema
> > > <https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canoni
> > > cal+Form+for+Schemas>
> > > doesn't include the default. I think that makes sense because the
> > canonical
> > > form is what's needed to read encoded data. Anyone with more context:
> is
> > > that correct?
> > >
> > > In my opinion, that makes how we handle defaults a bit more flexible
> > > because schemas with different defaults are "the same". I'd support
> > adding
> > > a new default field that handles values more naturally. We've always
> had
> > a
> > > problem with binary as well and I'd like to see us use base64 encoded
> > > values instead of the current strategy.
> > >
> > > rb
> > >
> > > On Tue, Oct 17, 2017 at 8:16 AM, Zoltan Ivanfi <zi@cloudera.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to start a discussion about making default values and
> > values
> > > > in general human-readable for logical types.
> > > >
> > > > Currently default values for logical types have to be specified in a
> > JSON
> > > > string as the binary representation of the backing primary type
> (e.g.,
> > > > "\u0000"). Some users intuitively try to specify a human-readable
> > logical
> > > > value in this string instead (e.g., "0.00"). This is of course a
> valid
> > > byte
> > > > sequence and as such is accepted, but it results in unexpected
> > behaviour
> > > (a
> > > > different default value than intended). Apart from being error prone,
> > > > specifying default values this way is also tedious. To keep this
> e-mail
> > > > brief, I won't list specific examples here, please see AVRO-2087
> > > > <https://issues.apache.org/jira/browse/AVRO-2087> for details
> instead.
> > > >
> > > > The problem of non-human-readable values applies to JSON encoding of
> > > actual
> > > > data as well. One reason for using JSON is that it is human readable
> > and
> > > > therefore easy to debug. Seeing "\u00018" in a JSON file is not too
> > > > intuitive and this specific example is actually quite misleading as
> > well
> > > > (it can be easily misread as "\u0018").
> > > >
> > > > Introducing a new default value field (called human-readable-default
> or
> > > > logical-default for example) would allow easier specification of
> > default
> > > > values. (It doesn't solve the problem of accidentally misusing the
> > > existing
> > > > field though.) It is, however, not backwards compatible. An older
> Avro
> > > > library would ignore the new field and use a different default value.
> > > >
> > > > Introducing human-readable values in the JSON encoding is even more
> > > clearly
> > > > a breaking change. (Although for JSON we could add the human-readable
> > > value
> > > > as a separate extra field that gets ignored when reading. Problem is,
> > > users
> > > > may be tempted to change the value and be surprised. It's a pity that
> > > JSON
> > > > does not allow comments.)
> > > >
> > > > In your opinions, what would be the best way to deal with this
> problem?
> > > >
> > > > Thanks,
> > > >
> > > > Zoltan
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> >
> > --
> >
> >
> > The information contained in this email message is PRIVATE and intended
> > only for the personal and confidential use of the recipient named above.
> If
> > the reader of this message is not the intended recipient or an agent
> > responsible for delivering it to the intended recipient, you are hereby
> > notified that you have received this message in error and that any
> review,
> > dissemination, distribution or copying of this message is strictly
> > prohibited.  If you have received this communication in error, please
> > notify us immediately by email, and delete the original message.
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>



-- 

Bridger Howell

Software Engineer

1200 N. Montana Ave

Helena, MT 59601

M: 406.422.9225

New York Times
<https://www.nytimes.com/2016/10/20/business/dealbook/sofi-an-online-lender-is-looking-for-a-relationship.html>
| Inc.
<http://www.inc.com/maria-aspan/sofi-plans-traditional-bank-accounts.html>
| Fast Company
<https://www.fastcompany.com/3060461/most-innovative-companies/inside-sofis-exclusive-club-for-great-people>
Wall Street Journal
<http://www.wsj.com/articles/online-lender-sofis-bond-deal-receives-moodys-highest-rating-1463847062>
| Quartz
<https://qz.com/721983/the-newest-workplace-benefit-for-millennials-paying-down-their-student-loans/>
| Forbes
<http://www.forbes.com/sites/mnewlands/2016/11/23/sofi-is-dominating-the-finance-space-heres-what-theyre-planning-next/#42c658036261>

-- 


The information contained in this email message is PRIVATE and intended 
only for the personal and confidential use of the recipient named above. If 
the reader of this message is not the intended recipient or an agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that you have received this message in error and that any review, 
dissemination, distribution or copying of this message is strictly 
prohibited.  If you have received this communication in error, please 
notify us immediately by email, and delete the original message.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message