From dev-return-97769-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Sep 5 02:36:42 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8A4A7180629 for ; Wed, 5 Sep 2018 02:36:41 +0200 (CEST) Received: (qmail 28883 invoked by uid 500); 5 Sep 2018 00:36:40 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 28870 invoked by uid 99); 5 Sep 2018 00:36:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2018 00:36:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9909EC1CF2 for ; Wed, 5 Sep 2018 00:36:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.011 X-Spam-Level: X-Spam-Status: No, score=-0.011 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluent-io.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Bml4rw53ezFH for ; Wed, 5 Sep 2018 00:36:37 +0000 (UTC) Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6E7575F1BD for ; Wed, 5 Sep 2018 00:36:37 +0000 (UTC) Received: by mail-pg1-f182.google.com with SMTP id x26-v6so2472170pge.12 for ; Tue, 04 Sep 2018 17:36:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluent-io.20150623.gappssmtp.com; s=20150623; h=to:references:from:openpgp:autocrypt:organization:subject :message-id:date:user-agent:mime-version:in-reply-to; bh=AK1w8WOApKTddiMezvj88czj0e+m1pRbFHMGOIFlviY=; b=tNZVk+jyLbyN/2cg8yG9dmfHeCjCzFXMSdPZIYW7v54Q3XleWVruwPbjOb1MGynQ+j AqyaQVLOk7CA1//r7Pg1Yu3Hoq4QDykHfkzAsZJQP+PklJvnfznuZlRH4w3YlGoSgsz2 cunOZ8BnkuyQ/ZqecOkri29g6vRHUTX1BVFKb8uFzhoSCzExJ8JzxxP2FTu0IwBFF2ID oUooO7AyvwXmgwJeu8mtAI7M3bAyXKga7sFJY1J7T34WbRDRWepL5/axFjRBuPkRwuwC U0/7wSvEgSZCwHAMeq1HhI4KbKSicadHjPm57/Z0KEEY920IiCtTtcfY1swjCky49/vM y8cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:references:from:openpgp:autocrypt :organization:subject:message-id:date:user-agent:mime-version :in-reply-to; bh=AK1w8WOApKTddiMezvj88czj0e+m1pRbFHMGOIFlviY=; b=Xay3fK/R4MLzBwIl9mVahxIEBenz0MmOwrI1bvz6imlkMVmaNmkYVWW3QrILnld8n5 xTCM+yWy8pIwTCFWPeBpAW14ah8s3ZmzGiIK68X7GOqBknsygaF1QiwLdkZ7HPHAEQdu sFoPca4S8p2NrlcsqJWBxyTs8ze+3wK9NPtiiuEoWx+QUqgzKBZQr+VpWzet/ulMb1P2 gk1EocW7Y/ExY9f20LgHF1/Th1+cUYzXPXce8IP+PpMwC38owkzndjvODfXUm4BZlHqC BHBihJNJbkFEaa1Oa9HJIZaHLWuSsrDcozveE93UwLyibk4/KFKlKJYwn1FN4QonNI+o 3fgg== X-Gm-Message-State: APzg51ATs1qaQmcH6gAzrkmubLhb2mwnKNHeTbROQnrZOx4W4EWxAUqe pgvZN0gRng2ZED5rGyviliQLpdELdys= X-Google-Smtp-Source: ANB0Vdb5DV37riWbGsaHcIraUdzZB7V3thKUJv+hJutnhNt8kknIY4voXNtMaBt0ibzvB6PRXK4oxw== X-Received: by 2002:a62:464f:: with SMTP id t76-v6mr37375090pfa.118.1536107795892; Tue, 04 Sep 2018 17:36:35 -0700 (PDT) Received: from Matthias-Sax-Macbook-Pro.local (50-0-2-20.static.sonic.net. [50.0.2.20]) by smtp.gmail.com with ESMTPSA id w5-v6sm227840pfn.44.2018.09.04.17.36.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Sep 2018 17:36:35 -0700 (PDT) To: dev@kafka.apache.org References: <5B750E38.80706@trivago.com> <5B7720B8.406@trivago.com> <5B7C262E.6090801@trivago.com> <0F84ADA7-8308-4CA4-B79F-B89E4303347F@gmail.com> <5B8D2B34.5070900@trivago.com> <5B8EECB8.4010503@trivago.com> From: "Matthias J. Sax" Openpgp: preference=signencrypt Autocrypt: addr=matthias@confluent.io; prefer-encrypt=mutual; keydata= xsFNBFcGWisBEAD4+gj1tJcLJRckkbZjdJpd1347/Zwndn8R6r2X+YYS5EgwzP5OQHl3Q6jl hAISoqBEfDeTJffsxd1wWL+6wKU4Y7zCkH/3aL/7znOlfaewpgJP3x/naawgvnJ0jlPlJtev MlAbG+9P6aEVxYfML59KBtRKzd6OZbSh0VzCJVCvkslv+LZqR94lhA0rArupqe7EO9DuP4/V bvnDxx1dZFtEK4n4wJYsRkF+TuxGClLcfosfM0oHTZeolus+rJTi7wxrbcTOlTmOMW0Wf9rK AobXsSz838RJenQqe3X0s5EBKCoIdI2SCQiTfcJ2JTVt6Ip1IDuEVqMQmtz7i2l3Rlml0GDa gODehmeMczVIBeO0+cppzOEynjQlWLCbJ8XEjISMI87Ied6DGbEYKLnG4ucRjM//8syKI4T+ Z90Y060jhWxrvr2pGqPPaU3qvIXW1D1mchXE1ba4HOdKb6fA7j5NU47WA2YmWRDhfM4exE0Q mD3Jfjfjyuch2rGhT3twSWHk7v5zlINwOfTIfeXvShqxNzJRFf6MudFnFbgmMTo51LmPcXHz 3tUaRNoky/HRpSxU7h145SgltrKSmYgWUnG4Y3qySyiPVKfBUBi/e5dYTk4Y0NDWGhZxOXCs ZV0NQsuoqFD82LrglwECrcdHd2QaKnIX2eKB7j63dMsexFDjewARAQABzSdNYXR0aGlhcyBK LiBTYXggPG1hdHRoaWFzQGNvbmZsdWVudC5pbz7CwXgEEwECACIFAlcGWisCGwMGCwkIBwMC BhUIAgkKCwQWAgMBAh4BAheAAAoJEDwRZiHEirRPWcoP/if/UkALwOvkct/IufpQhJ9qmg/a +QrSkDbPFrkWA7r9aMnaX8C5xuFgFuekT2aMzYYEr3eQwNHeeeEEFUhJQnMsKrjMw55w8+J5 Zhzdg8OKepFT0BOovXzITZcyqBW5JdMjfwEkdztBmPHbvWCCjglSllCvNDorxF1FCYuiL+F3 aBx0SaKaXuNhA4GO2IBY68SQ04ueeSbKbnukyWdAYXObBPBvbcoJXeesX6QvANxSjCqPHRsc czAr1mADzPN58nRXrOYgeonhPROYRlhLyEJM9CnGby/GRb9WfrKFwQVVjpNT0dD9vOvobmEp o4//m6qeXh6xC/egW9vBl63fJNOb/A6T1JdFdU8VWpUofAZtPHE61Y11AmkL7EoY5eUhlL0i jCQ4+K4wERP5NGOBEHe6wfxWoQMnPwnj59N8GylCOMhaVaIYqqRuAssMWyliX8nAj0dO3Hbw EZmBMt/xAAZpXCJU3iDS1G0+fs5LIIBWJodBvjec3DpIlib3PGG6IKPfkwpTfyoStRaPAc+X 085/KhgI8hM1nCxzI/0iQSsuyrJikYcCiB+EukaxTy1TS7O3Ul7Sg6f5f2YXB3jh9rPgZf/C y8iIJDyY+zfJLaYPO3uMEtqW69SWVeLM2viy5pj3MtEzDgC16OLINRdIsibPXRchdv7KN2FZ w4uvgTnZzsFNBFrC7GQBEADRIp0Uj4DUrFn049ki3qTuxanbr+76JUeNZ/WT0EUnBq4+v/fB KCHNQxLJJN3HdQLupcauDYIRAnJg5gB9wOcKgjMXmwZ3F8pX/j8Hmf7CEZPbElh64ZpxkNFc 83k6ZiJ2hsQk6F7K3LvqsvgkKAcfHY/Fd/Xwo4YodrcX0OWr+Szp++FPDIHWpiGNo16uobt8 2xUWN5KW0YPiG2gmvHeGGGVeXScn5I76MpJlt1lRCt/uWrYHYc8rupywIn/wCBwKKoNdXaYL TdVlwUB6wxGjNQphBXej173Pckwx+xwb+1KngWS/I/YXGan7+aBJTfkM2kxACqexsxS8NgUx K1VuELlsZRa2AxpzeSmR6Ev6YAAIwjdgiiuTKc3D9LzSb1zQZJ/xrpRy2OdJEXELCFjil0eM ANQH4jovicaWrSd0EH4Ipk6TXjHpkNnGQ2Oy7kzMcwEvaQxQ4SszOSdqDjwR3vQ5dV7d5RCP 1+9qYzFb/sKpoQi9FK/fVUIcPnIdv12B4uU97Vs+ONrOFl8ER+d36fcpogpyNRWSZIttQ96P +kKOAA2L+BP3UdLpchQu8/t+0BZyVmgySWSo8oJA2qFB6rMa8ZsFQoK8m/z7ukBzx7E9geKK 2ZOpqRawJ8nCiDb1nBmJYwgl6ltWV3utg9nLr5oY+35mhnzUszBSoiZlCQARAQABwsF8BBgB CAAmFiEEV9+AVI/wFUXaxShcPBFmIcSKtE8FAlrC7GQCGwwFCQHhM4AACgkQPBFmIcSKtE9O hA/+N5Da5bZVMyEWXuSoASgEAC8uWzT8cVy78XDoGzlPXfEmtz0YC+sWR4psWWIbDpJOcpKe D2AcNuYl24ida2HE+h09LZq8l9EzWpvI30fR9c5LSrKCQJFHyfla3JRCZPr8yT4oQeMYls/2 hP8tk+R2IeZ2aRxLLXCdyBYbmlhK+3y38XZuvzyxbwIQ0plWB26FgnmQZ1PVcU2lfgs3IYBg Vog8gIfvdqxPdxBnV05yE5WhqcdplTmNXyaV4eU1kNfpIMvuhPo1hVKEAznk6csKQcjYBte5 kBLtJKgYR3/lMU2OhF7nBya2HqSWnwCYk7L7f26t0WIVa8JztLkGcEktHFZHs9i0tqGeUSe5 ydiB2tF07K4g1XftQB4VbDSoogp8InTlf2Bfmad0MRcOnWR8BSKyYxgpMljD5wbFnN975s+F d+avW1umqE0raN/chgWQbZf5365hjcWgDzP+o3Vg1aD37CkrWJ+pMFS/FB9VTBYLAdXQvu7O oT7wtga9MXxH3S/c/C5LtFgVYSwuZhGhmB2cKbTMhwToG6rxREjvcYIOA1afynTpMjlTfIIk QPzqnpk6ZIIpcxq+T+PGoiAwV3XmlpPlJeumA1tBSj4OBqVvHYDObcnQaibBK6fxihTgLQHW 8SyMqdjjZyUaNzN3jxG7826WEzmGIs1Js4gPbSM= Organization: Confluent Inc Subject: Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted. Message-ID: <01b9014b-ed60-0f2a-b6ac-721ab0192f24@confluent.io> Date: Tue, 4 Sep 2018 17:36:34 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <5B8EECB8.4010503@trivago.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="8NLLPFw7Tlf1LIaxhSyIa0uGO3JpLJ5Hi" --8NLLPFw7Tlf1LIaxhSyIa0uGO3JpLJ5Hi Content-Type: multipart/mixed; boundary="CErK3q8FGF4PAjBcetJElOwXxlzpJASeZ"; protected-headers="v1" From: "Matthias J. Sax" To: dev@kafka.apache.org Message-ID: <01b9014b-ed60-0f2a-b6ac-721ab0192f24@confluent.io> Subject: Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted. References: <5B750E38.80706@trivago.com> <5B7720B8.406@trivago.com> <5B7C262E.6090801@trivago.com> <0F84ADA7-8308-4CA4-B79F-B89E4303347F@gmail.com> <5B8D2B34.5070900@trivago.com> <5B8EECB8.4010503@trivago.com> In-Reply-To: <5B8EECB8.4010503@trivago.com> --CErK3q8FGF4PAjBcetJElOwXxlzpJASeZ Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Hi, I am just catching up on this thread. I did not read everything so far, but want to share couple of initial thoughts: Headers: I think there is a fundamental difference between header usage in this KIP and KP-258. For 258, we add headers to changelog topic that are owned by Kafka Streams and nobody else is supposed to write into them. In fact, no user header are written into the changelog topic and thus, there are not conflicts. Nevertheless, I don't see a big issue with using headers within Streams. As long as we document it, we can have some "reserved" header keys and users are not allowed to use when processing data with Kafka Streams. IMHO, this should be ok. > I think there is a safe way to avoid conflicts, since these headers are= > only needed in internal topics (I think): > For internal and changelog topics, we can namespace all headers: > * user-defined headers are namespaced as "external." + headerKey > * internal headers are namespaced as "internal." + headerKey While name spacing would be possible, it would require to deserialize user headers what implies a runtime overhead. I would suggest to no namespace for now to avoid the overhead. If this becomes a problem in the future, we can still add name spacing later on. My main concern about the design it the type of the result KTable: If I understood the proposal correctly, KTable table1 =3D ... KTable table2 =3D ... KTable joinedTable =3D table1.join(table2,...); implies that the `joinedTable` has the same key as the left input table. IMHO, this does not work because if table2 contains multiple rows that join with a record in table1 (what is the main purpose of a foreign key join), the result table would only contain a single join result, but not multiple. Example: table1 input stream: table2 input stream: , We use table2 value a foreign key to table1 key (ie, "A" joins). If the result key is the same key as key of table1, this implies that the result can either be or but not both. Because the share the same key, whatever result record we emit later, overwrite the previous result. This is the reason why Jan originally proposed to use a combination of both primary keys of the input tables as key of the output table. This makes the keys of the output table unique and we can store both in the output table: Result would be , Thoughts? -Matthias On 9/4/18 1:36 PM, Jan Filipiak wrote: > Just on remark here. > The high-watermark could be disregarded. The decision about the forward= > depends on the size of the aggregated map. > Only 1 element long maps would be unpacked and forwarded. 0 element map= s > would be published as delete. Any other count > of map entries is in "waiting for correct deletes to arrive"-state. >=20 > On 04.09.2018 21:29, Adam Bellemare wrote: >> It does look like I could replace the second repartition store and >> highwater store with a groupBy and reduce.=C2=A0 However, it looks lik= e I >> would >> still need to store the highwater value within the materialized store,= to >> compare the arrival of out-of-order records (assuming my understanding= of >> THIS is correct...). This in effect is the same as the design I have n= ow, >> just with the two tables merged together. >=20 --CErK3q8FGF4PAjBcetJElOwXxlzpJASeZ-- --8NLLPFw7Tlf1LIaxhSyIa0uGO3JpLJ5Hi Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIzBAEBCgAdFiEESn/iOv2tmCkcP0KLVp2sL37kObwFAluPJRIACgkQVp2sL37k ObyYEA/+KRQvclUx+EWsxDne2/o0VJXoe8T1BPlq8HYWDj+gGZuxY0C4BgBygQ+v TmYVFY2kF8/23ZJGbohbU5Zw0mwdcM5IeTaKMINEsBIvLJDbanWCQXCJbljPkDki zRDgltlHoAiLFEL0fYqQQrzgRpwevuTfTx1sxZVKc0IJanq+oo9LR/8lmE7wgnHe TjbaX2ySsP5WXr8HQJRFp0iVQ6ucltydRHFQngXypCBxlo22Zs+X9QKlQJaWHzt1 SBPDwF53QBUuwtDmSgOcX08T3fJNjSGjrQct/wb2VjhJ5I19rn+evv9e6qc8eaZh FUXJ9PwbushRNm3Xc5QHxru+nbooX01+vf5WuM43PAaKgHl934gLsOpkACBb4xxy lVTMLJS5S8kAKtOA8tpdkFOOxPXrGvUgiQ8r0Ho2g3awa+O4ainOhl6XW9uYo6lS QsP5I9CwugVn+WJoSyG4MHpjbP2lCQYFZXpGWB9c2e975xCvW86GIrnNetO08fK+ oX1HlXAiTXvW466GDzdJf7FoR3XdWRA3QsPwJni2NS1ZFtic7tFvhmjbNjbHTC9G 28Pppe62ReVs3pJIZXH2C1aPuPNrcKwI7Kj6GM9GALrgESsxaL1P63TowOOQwauk Z2l+lXDhC6T9XbnkGHxqJhZu1Aau6INobkZUpSrqCxWLoJh3Ox8= =U6Xk -----END PGP SIGNATURE----- --8NLLPFw7Tlf1LIaxhSyIa0uGO3JpLJ5Hi--