kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Coates <a...@confluent.io>
Subject Re: [DISCUSS] KIP-481: SerDe Improvements for Connect Decimal type in JSON
Date Mon, 29 Jul 2019 11:57:03 GMT
The way I see it, we need to control two seperate things:

1. How do we _deserialize_ a decimal type if we encounter a text node in
the JSON?    (We should _always_ be able to deserialize a standard JSON
number as a decimal).
2. How do we chose how we want decimals to be _serialized_.

This looks to fits well with your second suggestion of slightly different
configs names for serialization vs deserialization.
a, For deserialization we only care about how to handle text nodes: `
deserialization.decimal.*text*.format`, which should only have two valid
values BINARY | TEXT.
b. For serialization we need all three: `serialization.decimal.format`,
which should support all three options: BINARY | TEXT | NUMERIC.

Implementation wise, I think these should be two separate enums, rather
than one shared enum and throwing an error if the deserializer is set to
NUMERIC.  Mainly as this means the enums reflect the options available,
rather than this being hidden in config checking code.  But that's a minor
implementation detail.

Personally, I'd be tempted to have the BINARY value named something like
`LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move away from
it.

It's a real shame that both of these settings require a default of BINARY
for backwards compatibility, but I agree that discussions / plans around
switching the defaults should not block this KIP.

Andy


On Thu, 25 Jul 2019 at 18:26, Almog Gavra <almog@confluent.io> wrote:

> Thanks for the replies Andy and Andrew (2x Andy?)!
>
> > Is the text decimal a base16 encoded number, or is it base16 encoded
> binary
> > form of the number?
>
> The conversion happens as decimal.unscaledValue().toByteArray() and then
> the byte array is converted to a hex string, so it's definitely the binary
> form of the number converted to base16. Whether or not that's the same as
> the base16 encoded number is a good question (toByteArray returns a byte
> array containing a signed, big-endian, two's complement representation of
> the big integer).
>
> > One suggestion I have is to change the proposed new config to only affect
> > decimals stored as text, i.e. to switch between the current base16 and
> the
> > more common base10.   Then add another config to the serializer only that
> > controls if decimals should be serialized as text or numeric.
>
> I think we need to be able to handle all mappings from serialization format
> to deserialization format (e.g. read in BINARY and output TEXT), which I
> think would be impossible with the alternative suggestion. I agree that
> automatically deserializing numerics is valuable. I see two other ways to
> get this, both keeping the serialization.format config the same:
>
> - have json.decimal.deserialization.format accept all three formats. if set
> to BINARY/TEXT, numerics would be automatically supported. If set to
> NUMERIC, then any string coming in would result in deserialization error
> (defaults to BINARY for backwards compatibility)
> - change json.decimal.deserialization.format to
> json.decimal.deserialization.string.format which accepts only BINARY/TEXT
> (defaults to BINARY for backwards compatibility)
>
> > would be a breaking change in that things that previously failed would
> > suddenly start deserializing.  This is a price I'm willing to pay.
>
> I agree. I'm willing to pay this price too.
>
> > IMHO, we should then plan to switch the default of decimal serialization
> to
> > numeric, and text serialization to base 10 in the next major release.
>
> I think that can be a separate discussion, I don't want to block this KIP
> on it.
>
> Thoughts?
>
> On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <otto@wikimedia.org> wrote:
>
> > This is a bit orthogonal, but in JsonSchemaConverter I use JSONSchemas to
> > indicate whether a JSON number should be deserialized as an integer or a
> > decimal
> > <
> >
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> > >.
> > Not everyone is going to have JSONSchemas available when converting, but
> if
> > you do, it is an easy way to support JSON numbers as decimals.
> >
> > Carry on! :)
> >
> > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <andy@confluent.io> wrote:
> >
> > > Hi Almog,
> > >
> > > Like the KIP - I think being able to support decimals in JSON in the
> same
> > > way most other systems do is a great improvement.
> > >
> > > It's not 100% clear to me from the KIP what the current format is.  Is
> > the
> > > text decimal a base16 encoded number, or is it base16 encoded binary
> form
> > > of the number? (I've not tried to get my head around if these two are
> > even
> > > different!)
> > >
> > > One suggestion I have is to change the proposed new config to only
> affect
> > > decimals stored as text, i.e. to switch between the current base16 and
> > the
> > > more common base10.   Then add another config to the serialzier only
> that
> > > controls if decimals should be serialized as text or numeric.  The
> > benefit
> > > of this approach is it allows us to enhance the deserializer to
> > > automatically handle numeric decimals even without any config having to
> > be
> > > set, i.e. default config in the deserializer would be able to handle
> > > numeric decimals.  Of course, this is a two edged sword: this would
> make
> > > the deserializer work out of the box with numeric decimals, (yay!), but
> > > would be a breaking change in that things that previously failed would
> > > suddenly start deserializing.  This is a price I'm willing to pay.
> > >
> > > IMHO, we should then plan to switch the default of decimal
> serialization
> > to
> > > numeric, and text serialization to base 10 in the next major release.
> > > (With upgrade notes to match). Though I know this is more contentious,
> I
> > > think it moves us forward in a much more standard way that the current
> > > encoding of decimals.
> > >
> > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <almog@confluent.io> wrote:
> > >
> > > > Hi Everyone!
> > > >
> > > > Kicking off discussion for a new KIP:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > > >
> > > > For those who are interested, I have a prototype implementation that
> > > helped
> > > > guide my design: https://github.com/agavra/kafka/pull/1
> > > >
> > > > Cheers,
> > > > Almog
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message