kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Almog Gavra <al...@confluent.io>
Subject Re: [DISCUSS] KIP-481: SerDe Improvements for Connect Decimal type in JSON
Date Mon, 29 Jul 2019 16:29:59 GMT
I'm mostly happy with your current suggestion (two configs, one for
serialization and one for deserialization) and your implementation
suggestion. One thing to note:

> We should _always_ be able to deserialize a standard JSON
> number as a decimal

I was doing some research into decimals and JSON, and I can imagine a
compelling reason to require string representations to avoid losing
precision and to be certain that whomever is sending the data isn't losing
precision (e.g. https://stackoverflow.com/a/38357877/2258040).

I'm okay with always allowing numerics, but thought it's worth raising the
thought.

On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <andy@confluent.io> wrote:

> The way I see it, we need to control two seperate things:
>
> 1. How do we _deserialize_ a decimal type if we encounter a text node in
> the JSON?    (We should _always_ be able to deserialize a standard JSON
> number as a decimal).
> 2. How do we chose how we want decimals to be _serialized_.
>
> This looks to fits well with your second suggestion of slightly different
> configs names for serialization vs deserialization.
> a, For deserialization we only care about how to handle text nodes: `
> deserialization.decimal.*text*.format`, which should only have two valid
> values BINARY | TEXT.
> b. For serialization we need all three: `serialization.decimal.format`,
> which should support all three options: BINARY | TEXT | NUMERIC.
>
> Implementation wise, I think these should be two separate enums, rather
> than one shared enum and throwing an error if the deserializer is set to
> NUMERIC.  Mainly as this means the enums reflect the options available,
> rather than this being hidden in config checking code.  But that's a minor
> implementation detail.
>
> Personally, I'd be tempted to have the BINARY value named something like
> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move away from
> it.
>
> It's a real shame that both of these settings require a default of BINARY
> for backwards compatibility, but I agree that discussions / plans around
> switching the defaults should not block this KIP.
>
> Andy
>
>
> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <almog@confluent.io> wrote:
>
> > Thanks for the replies Andy and Andrew (2x Andy?)!
> >
> > > Is the text decimal a base16 encoded number, or is it base16 encoded
> > binary
> > > form of the number?
> >
> > The conversion happens as decimal.unscaledValue().toByteArray() and then
> > the byte array is converted to a hex string, so it's definitely the
> binary
> > form of the number converted to base16. Whether or not that's the same as
> > the base16 encoded number is a good question (toByteArray returns a byte
> > array containing a signed, big-endian, two's complement representation of
> > the big integer).
> >
> > > One suggestion I have is to change the proposed new config to only
> affect
> > > decimals stored as text, i.e. to switch between the current base16 and
> > the
> > > more common base10.   Then add another config to the serializer only
> that
> > > controls if decimals should be serialized as text or numeric.
> >
> > I think we need to be able to handle all mappings from serialization
> format
> > to deserialization format (e.g. read in BINARY and output TEXT), which I
> > think would be impossible with the alternative suggestion. I agree that
> > automatically deserializing numerics is valuable. I see two other ways to
> > get this, both keeping the serialization.format config the same:
> >
> > - have json.decimal.deserialization.format accept all three formats. if
> set
> > to BINARY/TEXT, numerics would be automatically supported. If set to
> > NUMERIC, then any string coming in would result in deserialization error
> > (defaults to BINARY for backwards compatibility)
> > - change json.decimal.deserialization.format to
> > json.decimal.deserialization.string.format which accepts only BINARY/TEXT
> > (defaults to BINARY for backwards compatibility)
> >
> > > would be a breaking change in that things that previously failed would
> > > suddenly start deserializing.  This is a price I'm willing to pay.
> >
> > I agree. I'm willing to pay this price too.
> >
> > > IMHO, we should then plan to switch the default of decimal
> serialization
> > to
> > > numeric, and text serialization to base 10 in the next major release.
> >
> > I think that can be a separate discussion, I don't want to block this KIP
> > on it.
> >
> > Thoughts?
> >
> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <otto@wikimedia.org> wrote:
> >
> > > This is a bit orthogonal, but in JsonSchemaConverter I use JSONSchemas
> to
> > > indicate whether a JSON number should be deserialized as an integer or
> a
> > > decimal
> > > <
> > >
> >
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> > > >.
> > > Not everyone is going to have JSONSchemas available when converting,
> but
> > if
> > > you do, it is an easy way to support JSON numbers as decimals.
> > >
> > > Carry on! :)
> > >
> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <andy@confluent.io> wrote:
> > >
> > > > Hi Almog,
> > > >
> > > > Like the KIP - I think being able to support decimals in JSON in the
> > same
> > > > way most other systems do is a great improvement.
> > > >
> > > > It's not 100% clear to me from the KIP what the current format is.
> Is
> > > the
> > > > text decimal a base16 encoded number, or is it base16 encoded binary
> > form
> > > > of the number? (I've not tried to get my head around if these two are
> > > even
> > > > different!)
> > > >
> > > > One suggestion I have is to change the proposed new config to only
> > affect
> > > > decimals stored as text, i.e. to switch between the current base16
> and
> > > the
> > > > more common base10.   Then add another config to the serialzier only
> > that
> > > > controls if decimals should be serialized as text or numeric.  The
> > > benefit
> > > > of this approach is it allows us to enhance the deserializer to
> > > > automatically handle numeric decimals even without any config having
> to
> > > be
> > > > set, i.e. default config in the deserializer would be able to handle
> > > > numeric decimals.  Of course, this is a two edged sword: this would
> > make
> > > > the deserializer work out of the box with numeric decimals, (yay!),
> but
> > > > would be a breaking change in that things that previously failed
> would
> > > > suddenly start deserializing.  This is a price I'm willing to pay.
> > > >
> > > > IMHO, we should then plan to switch the default of decimal
> > serialization
> > > to
> > > > numeric, and text serialization to base 10 in the next major release.
> > > > (With upgrade notes to match). Though I know this is more
> contentious,
> > I
> > > > think it moves us forward in a much more standard way that the
> current
> > > > encoding of decimals.
> > > >
> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <almog@confluent.io>
> wrote:
> > > >
> > > > > Hi Everyone!
> > > > >
> > > > > Kicking off discussion for a new KIP:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > > > >
> > > > > For those who are interested, I have a prototype implementation
> that
> > > > helped
> > > > > guide my design: https://github.com/agavra/kafka/pull/1
> > > > >
> > > > > Cheers,
> > > > > Almog
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message