From dev-return-106053-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Thu Jul 25 17:26:23 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id E61CE18066C for ; Thu, 25 Jul 2019 19:26:22 +0200 (CEST) Received: (qmail 2649 invoked by uid 500); 25 Jul 2019 17:26:20 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 2626 invoked by uid 99); 25 Jul 2019 17:26:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2019 17:26:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 78A90180ED9 for ; Thu, 25 Jul 2019 17:26:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.801 X-Spam-Level: * X-Spam-Status: No, score=1.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluent.io Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JnMwHH5YrK4H for ; Thu, 25 Jul 2019 17:26:16 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::b41; helo=mail-yb1-xb41.google.com; envelope-from=almog@confluent.io; receiver= Received: from mail-yb1-xb41.google.com (mail-yb1-xb41.google.com [IPv6:2607:f8b0:4864:20::b41]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 078F37E0FD for ; Thu, 25 Jul 2019 17:26:15 +0000 (UTC) Received: by mail-yb1-xb41.google.com with SMTP id j199so18751135ybg.5 for ; Thu, 25 Jul 2019 10:26:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluent.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=oef8AF3h84QepWzhzO1QrMJXrstSm4792L61UnQBxEg=; b=Hb/cGnT0ozWv0CKDGHt4wny9IK6gyPP+10YOj+8I9cZTDUukSfgHO4FhcqNIVbofiy 39WTFA5bnrmZg3C7jantLsh32SRNqzTOzgWeh1V7wgYDR1OULLGNwcQoHXGBWGVQ/BDj LdN5WGkbt7/c/1M9b14PvqtBKis15v6cCSG+kUZyHbOyCOvrg+UtpSBS/Wj2p8k251rM HNIzZJxCKZxPkoKDoxXFY0g+UIy4gKwlniiN31U98Hg/czCr4LKc1Uj9sbuKk2Clilu/ fXjPi009IpkCWpLt7jGpZiMew1rGS+WKXtMJZICBe3NB2ZeqyICw7yINMv16Eyh7/R9h Jc3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=oef8AF3h84QepWzhzO1QrMJXrstSm4792L61UnQBxEg=; b=Gbv51Rggff6RiS3RTKhQ9stUqaCUCKcIF7Wpnf/OEOfI6BxnUAAv+ogTEsVRJ9m0AE 2vPbsKALUsUOv6xPlm6so3g5UB6JSkw2YgqIx72dFpMWq0JfW2VgFFTU9NMh2rl146lV xM5MaroysebnMXPAm8KLGg14DpXlV358zZKlutvW6f/iDmqFiJNvkONarzB+ISdguSUE L48BmAv8VLjZwZcb/CUsNuLsslOEAWd6lFtK7Tg7mBtuKZapT0cR/1JZK8zeV0WUZk2g 7n13XxfhiC7vSr3WX4H0he5gykZ2aEIAw47jN5bdRhb4LJiSjDY7pZTTqGuyX25Ki1yo Iw6A== X-Gm-Message-State: APjAAAU5ObpZOjhKvKWMXXLX66xxubJOKilCYfWr6I599Tn7n2w3sKV6 BtvrNly3S+wm8tBTtHzbuDYKHON6fAFiv9UzwVuWAWCKcA== X-Google-Smtp-Source: APXvYqxo7F+silAH9BPu5bM3HYTQLZjivNga5xHM/YGEz1EWNrg/JKWZAGiJTrHJk12a0c/SLmpownSBulL6afcakCo= X-Received: by 2002:a25:b11e:: with SMTP id g30mr59411363ybj.9.1564075574434; Thu, 25 Jul 2019 10:26:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Almog Gavra Date: Thu, 25 Jul 2019 10:26:03 -0700 Message-ID: Subject: Re: [DISCUSS] KIP-481: SerDe Improvements for Connect Decimal type in JSON To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="000000000000043dd5058e84b9b8" --000000000000043dd5058e84b9b8 Content-Type: text/plain; charset="UTF-8" Thanks for the replies Andy and Andrew (2x Andy?)! > Is the text decimal a base16 encoded number, or is it base16 encoded binary > form of the number? The conversion happens as decimal.unscaledValue().toByteArray() and then the byte array is converted to a hex string, so it's definitely the binary form of the number converted to base16. Whether or not that's the same as the base16 encoded number is a good question (toByteArray returns a byte array containing a signed, big-endian, two's complement representation of the big integer). > One suggestion I have is to change the proposed new config to only affect > decimals stored as text, i.e. to switch between the current base16 and the > more common base10. Then add another config to the serializer only that > controls if decimals should be serialized as text or numeric. I think we need to be able to handle all mappings from serialization format to deserialization format (e.g. read in BINARY and output TEXT), which I think would be impossible with the alternative suggestion. I agree that automatically deserializing numerics is valuable. I see two other ways to get this, both keeping the serialization.format config the same: - have json.decimal.deserialization.format accept all three formats. if set to BINARY/TEXT, numerics would be automatically supported. If set to NUMERIC, then any string coming in would result in deserialization error (defaults to BINARY for backwards compatibility) - change json.decimal.deserialization.format to json.decimal.deserialization.string.format which accepts only BINARY/TEXT (defaults to BINARY for backwards compatibility) > would be a breaking change in that things that previously failed would > suddenly start deserializing. This is a price I'm willing to pay. I agree. I'm willing to pay this price too. > IMHO, we should then plan to switch the default of decimal serialization to > numeric, and text serialization to base 10 in the next major release. I think that can be a separate discussion, I don't want to block this KIP on it. Thoughts? On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto wrote: > This is a bit orthogonal, but in JsonSchemaConverter I use JSONSchemas to > indicate whether a JSON number should be deserialized as an integer or a > decimal > < > https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261 > >. > Not everyone is going to have JSONSchemas available when converting, but if > you do, it is an easy way to support JSON numbers as decimals. > > Carry on! :) > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates wrote: > > > Hi Almog, > > > > Like the KIP - I think being able to support decimals in JSON in the same > > way most other systems do is a great improvement. > > > > It's not 100% clear to me from the KIP what the current format is. Is > the > > text decimal a base16 encoded number, or is it base16 encoded binary form > > of the number? (I've not tried to get my head around if these two are > even > > different!) > > > > One suggestion I have is to change the proposed new config to only affect > > decimals stored as text, i.e. to switch between the current base16 and > the > > more common base10. Then add another config to the serialzier only that > > controls if decimals should be serialized as text or numeric. The > benefit > > of this approach is it allows us to enhance the deserializer to > > automatically handle numeric decimals even without any config having to > be > > set, i.e. default config in the deserializer would be able to handle > > numeric decimals. Of course, this is a two edged sword: this would make > > the deserializer work out of the box with numeric decimals, (yay!), but > > would be a breaking change in that things that previously failed would > > suddenly start deserializing. This is a price I'm willing to pay. > > > > IMHO, we should then plan to switch the default of decimal serialization > to > > numeric, and text serialization to base 10 in the next major release. > > (With upgrade notes to match). Though I know this is more contentious, I > > think it moves us forward in a much more standard way that the current > > encoding of decimals. > > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra wrote: > > > > > Hi Everyone! > > > > > > Kicking off discussion for a new KIP: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON > > > > > > For those who are interested, I have a prototype implementation that > > helped > > > guide my design: https://github.com/agavra/kafka/pull/1 > > > > > > Cheers, > > > Almog > > > > > > --000000000000043dd5058e84b9b8--