kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Almog Gavra <al...@confluent.io>
Subject Re: [DISCUSS] KIP-481: SerDe Improvements for Connect Decimal type in JSON
Date Thu, 25 Jul 2019 17:26:03 GMT
Thanks for the replies Andy and Andrew (2x Andy?)!

> Is the text decimal a base16 encoded number, or is it base16 encoded
binary
> form of the number?

The conversion happens as decimal.unscaledValue().toByteArray() and then
the byte array is converted to a hex string, so it's definitely the binary
form of the number converted to base16. Whether or not that's the same as
the base16 encoded number is a good question (toByteArray returns a byte
array containing a signed, big-endian, two's complement representation of
the big integer).

> One suggestion I have is to change the proposed new config to only affect
> decimals stored as text, i.e. to switch between the current base16 and the
> more common base10.   Then add another config to the serializer only that
> controls if decimals should be serialized as text or numeric.

I think we need to be able to handle all mappings from serialization format
to deserialization format (e.g. read in BINARY and output TEXT), which I
think would be impossible with the alternative suggestion. I agree that
automatically deserializing numerics is valuable. I see two other ways to
get this, both keeping the serialization.format config the same:

- have json.decimal.deserialization.format accept all three formats. if set
to BINARY/TEXT, numerics would be automatically supported. If set to
NUMERIC, then any string coming in would result in deserialization error
(defaults to BINARY for backwards compatibility)
- change json.decimal.deserialization.format to
json.decimal.deserialization.string.format which accepts only BINARY/TEXT
(defaults to BINARY for backwards compatibility)

> would be a breaking change in that things that previously failed would
> suddenly start deserializing.  This is a price I'm willing to pay.

I agree. I'm willing to pay this price too.

> IMHO, we should then plan to switch the default of decimal serialization
to
> numeric, and text serialization to base 10 in the next major release.

I think that can be a separate discussion, I don't want to block this KIP
on it.

Thoughts?

On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <otto@wikimedia.org> wrote:

> This is a bit orthogonal, but in JsonSchemaConverter I use JSONSchemas to
> indicate whether a JSON number should be deserialized as an integer or a
> decimal
> <
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> >.
> Not everyone is going to have JSONSchemas available when converting, but if
> you do, it is an easy way to support JSON numbers as decimals.
>
> Carry on! :)
>
> On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <andy@confluent.io> wrote:
>
> > Hi Almog,
> >
> > Like the KIP - I think being able to support decimals in JSON in the same
> > way most other systems do is a great improvement.
> >
> > It's not 100% clear to me from the KIP what the current format is.  Is
> the
> > text decimal a base16 encoded number, or is it base16 encoded binary form
> > of the number? (I've not tried to get my head around if these two are
> even
> > different!)
> >
> > One suggestion I have is to change the proposed new config to only affect
> > decimals stored as text, i.e. to switch between the current base16 and
> the
> > more common base10.   Then add another config to the serialzier only that
> > controls if decimals should be serialized as text or numeric.  The
> benefit
> > of this approach is it allows us to enhance the deserializer to
> > automatically handle numeric decimals even without any config having to
> be
> > set, i.e. default config in the deserializer would be able to handle
> > numeric decimals.  Of course, this is a two edged sword: this would make
> > the deserializer work out of the box with numeric decimals, (yay!), but
> > would be a breaking change in that things that previously failed would
> > suddenly start deserializing.  This is a price I'm willing to pay.
> >
> > IMHO, we should then plan to switch the default of decimal serialization
> to
> > numeric, and text serialization to base 10 in the next major release.
> > (With upgrade notes to match). Though I know this is more contentious, I
> > think it moves us forward in a much more standard way that the current
> > encoding of decimals.
> >
> > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <almog@confluent.io> wrote:
> >
> > > Hi Everyone!
> > >
> > > Kicking off discussion for a new KIP:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > >
> > > For those who are interested, I have a prototype implementation that
> > helped
> > > guide my design: https://github.com/agavra/kafka/pull/1
> > >
> > > Cheers,
> > > Almog
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message