Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 158BD200C83 for ; Sun, 28 May 2017 19:26:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 13F64160BCC; Sun, 28 May 2017 17:26:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0B281160BAF for ; Sun, 28 May 2017 19:26:41 +0200 (CEST) Received: (qmail 57163 invoked by uid 500); 28 May 2017 17:26:41 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 57148 invoked by uid 99); 28 May 2017 17:26:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 May 2017 17:26:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4733B18055C for ; Sun, 28 May 2017 17:26:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CZNHbEa331zM for ; Sun, 28 May 2017 17:26:35 +0000 (UTC) Received: from mail-it0-f44.google.com (mail-it0-f44.google.com [209.85.214.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DF8815F477 for ; Sun, 28 May 2017 17:26:34 +0000 (UTC) Received: by mail-it0-f44.google.com with SMTP id c15so17498729ith.0 for ; Sun, 28 May 2017 10:26:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=1AvBKOUEVb3vgsy2oZAroCvnuvuZ5qsjKPCVCwkX6dY=; b=qRlq0jfCcieq4XM9wInWD1ppGZV8/ych+y9so5L70VaH4qhH5tahGQ99V7N/LWsZke t0Ul4cjvhmX+Ze3tYRVbV16JwTpvynmFiQDPMhq+lGrNrnzfoEeGdHPKFFBQNMi9xsPR MWP4/jq57fpZkA1F9btA4dPe4jGFiyrGeW69Uy288xpu1da4h6dLV8254bu2ROpaSZkX 6oi3fALI7HG845Ag32iiaxcTCJO0xo5+Zv1QJ7dyDTno0jYbFmfPXX879WVjDS0NazIx 9ENfANaDlfT1HOrHXpPMyBlItdYeUflR+QB5pLaMzSxxoY+OfFORWOmuhUC9/4HWYKCz WaFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=1AvBKOUEVb3vgsy2oZAroCvnuvuZ5qsjKPCVCwkX6dY=; b=pY21fTcTU6ZaD/IHyCGYrwCpyeX3IaaIsihmEF2d8WcIT2rWKffewzpxESEERA9fYN u4R3064xEGiD3OEKLtkddoyEI8E5yF6Z7OC9RWfTsN/EDBkdydEBOvbuBg/hEkRyJndh VGcfgw0qNsj/WYijHtt5a6jVEd86xvd8dCjoqBqvO+YHk8/dkme+Tj7BpopYX9BXzS26 i8k19m+koiLtols6FZgIaW2ZMVibV042heWH9KyDAwyojjwR4tDVVtYxuCDm+yTAopU+ /4ZgsbR59jVukHLDVLtx7+UNV8niNbsEjUTlXZYIt9WSddIEZMoIprdmqtiJEt2aualk 3Mtg== X-Gm-Message-State: AODbwcBA1wJVd3y+P1ZRwu0WeTAEHV/dYOdQOrSXEjZbgCUTzP7W2BYR q5XXnDdZbJlLSYG8+S5jzTy5m2XhWg== X-Received: by 10.36.57.137 with SMTP id l131mr13590034ita.61.1495992394169; Sun, 28 May 2017 10:26:34 -0700 (PDT) MIME-Version: 1.0 References: <490c704d-fdbb-db10-8af4-152b0da82c49@openbet.com> <8818b25a-100a-4cbd-1458-f99d4e466fe9@confluent.io> <1f51972a-63a8-ee93-8d26-002e623ca3b7@confluent.io> In-Reply-To: <1f51972a-63a8-ee93-8d26-002e623ca3b7@confluent.io> From: Jeyhun Karimov Date: Sun, 28 May 2017 17:26:23 +0000 Message-ID: Subject: Re: [DISCUSS]: KIP-159: Introducing Rich functions to Streams To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="001a114a9b663dc20a055098df10" archived-at: Sun, 28 May 2017 17:26:43 -0000 --001a114a9b663dc20a055098df10 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable After your response on KIP-149 related with ValueTransformerSupplier, everything you mentioned now makes complete sense. Thanks for clarification. Just a note: We will have additional (to KIP-149) overloaded methods: for each withKey and withoutKey methods (ValueMapper and ValueMapperWithKey) we will have overloaded methods with RecordContext argument. Other than this issue, I don't see any limitation. Cheers, Jeyhun On Sun, May 28, 2017 at 6:34 PM Matthias J. Sax wrote: > Thanks for you comments Jeyhun, > > I agree about the disadvantages. Only the punctuation part is something > I don't buy. IMHO, RichFunctions should not allow to register and use > punctuation. If you need punctuation, you should use #transform() or > similar. Note, that we plan to provide `RecordContext` and not > `ProcessorContext` and thus, it's not even possible to register > punctuations. > > One more thought: if you go with `init()` and `close()` we basically > allow users to have an in-memory state for a function. Thus, we cannot > share a single instance of RichValueMapper (etc) over multiple tasks and > we would need a supplier pattern similar to #transform(). And this would > "break the flow" of the API, as (Rich)ValueMapperSupplier would not > inherit from ValueMapper and thus we would need many new overload for > KStream/KTable classes. > > The overall goal of RichFunction (from my understanding) was to provide > record metadata information (like offset, timestamp, etc) to the user. > And we still have #transform() that provided the init and close > functionality. So if we introduce those with RichFunction we are quite > close to what #transform provides, and thus it feels as if we duplicate > functionality. > > For this reason, it seems to be better to got with the > `#valueMapper(ValueMapper mapper, RecordContext context)` approach. > > WDYT? > > > > -Matthias > > On 5/27/17 11:00 AM, Jeyhun Karimov wrote: > > Hi, > > > > Thanks for your comments. I will refer the overall approach as rich > > functions until we find a better name. > > > > I think there are some pros and cons of the approach you described. > > > > Pros is that it is simple, has clear boundaries, avoids misunderstandin= g > of > > term "function". > > So you propose sth like: > > KStream.valueMapper (ValueMapper vm, RecordContext rc) > > or > > having rich functions with only a single init(RecordContext rc) method. > > > > Cons is that: > > - This will bring another set of overloads (if we use RecordContext as= a > > separate parameter). We should consider that the rich functions will be > for > > all main interfaces. > > - I don't think that we need lambdas in rich functions. It is by > > definition "rich" so, no single method in interface -> as a result no > > lambdas. > > - I disagree that rich functions should only contain init() method. Th= is > > depends on each interface. For example, for specific interfaces we can > add > > methods (like punctuate()) to their rich functions. > > > > > > Cheers, > > Jeyhun > > > > > > > > On Thu, May 25, 2017 at 1:02 AM Matthias J. Sax > > wrote: > > > >> I confess, the term is borrowed from Flink :) > >> > >> Personally, I never thought about it, but I tend to agree with Michal.= I > >> also want to clarify, that the main purpose is the ability to access > >> record metadata. Thus, it might even be sufficient to only have "init"= . > >> > >> An alternative would of course be, to pass in the RecordContext as > >> method parameter. This would allow us to drop "init()". This might eve= n > >> allow to use Lambdas and we could keep the name RichFunction as we > >> preserve the nature of being a function. > >> > >> > >> -Matthias > >> > >> On 5/24/17 12:13 PM, Jeyhun Karimov wrote: > >>> Hi Michal, > >>> > >>> Thanks for your comments. I see your point and I agree with it. > However, > >>> I don't have a better idea for naming. I checked MR source code. Ther= e > >>> it is used JobConfigurable and Closable, two different interfaces. > Maybe > >>> we can rename RichFunction as Configurable? > >>> > >>> > >>> Cheers, > >>> Jeyhun > >>> > >>> On Tue, May 23, 2017 at 2:58 PM Michal Borowiecki > >>> = > > >>> wrote: > >>> > >>> Hi Jeyhun, > >>> > >>> I understand your argument about "Rich" in RichFunctions. Perhaps > >>> I'm just being too puritan here, but let me ask this anyway: > >>> > >>> What is it that makes something a function? To me a function is > >>> something that takes zero or more arguments and possibly returns = a > >>> value and while it may have side-effects (as opposed to "pure > >>> functions" which can't), it doesn't have any life-cycle of its ow= n. > >>> This is what, in my mind, distinguishes the concept of a "functio= n" > >>> from that of more vaguely defined concepts. > >>> > >>> So if we add a life-cycle to a function, in that understanding, i= t > >>> doesn't become a rich function but instead stops being a function > >>> altogether. > >>> > >>> You could say it's "just semantics" but to me precise use of > >>> language in the given context is an important foundation for good > >>> engineering. And in the context of programming "function" has a > >>> precise meaning. Of course we can say that in the context of Kafk= a > >>> Streams "function" has a different, looser meaning but I'd argue > >>> that won't do anyone any good. > >>> > >>> On the other hand other frameworks such as Flink use this > >>> terminology, so it could be that consistency is the reason. I'm > >>> guessing that's why the name was proposed in the first place. My > >>> point is simply that it's a poor choice of wording and Kafka > Streams > >>> don't have to follow that to the letter. > >>> > >>> Cheers, > >>> > >>> Michal > >>> > >>> > >>> On 23/05/17 13:26, Jeyhun Karimov wrote: > >>>> Hi Michal, > >>>> > >>>> Thanks for your comments. > >>>> > >>>> > >>>> To me at least it feels strange that something is called a > >>>> function yet doesn't follow the functional interface > >>>> definition of having just one abstract method. I suppose ini= t > >>>> and close could be made default methods with empty bodies on= ce > >>>> Java 7 support is dropped to mitigate that concern. Still, I > >>>> feel some resistance to consider something that requires > >>>> initialisation and closing (which implies holding state) as > >>>> being a function. Sounds more like the Processor/Transformer > >>>> kind of thing semantically, rather than a function. > >>>> > >>>> > >>>> - If we called the interface name only Function your assumptio= ns > >>>> will hold. However, the keyword Rich by definition implies that = we > >>>> have a function (as you described, with one abstract method and > >>>> etc) but it is rich. So, there are multiple methods in it. > >>>> Ideally it should be: > >>>> > >>>> public interface RichFunction extends Function { // thi= s > >>>> is the Function that you described > >>>> void close(); > >>>> void init(Some params); > >>>> ... > >>>> } > >>>> > >>>> > >>>> The KIP says there are multiple use-cases for this but doesn= 't > >>>> enumerate any - I think some examples would be useful, > >>>> otherwise that section sounds a little bit vague. > >>>> > >>>> > >>>> I thought it is obvious by definition but I will update it. > Thanks. > >>>> > >>>> > >>>> IMHO, it's the access to the RecordContext is where the adde= d > >>>> value lies but maybe I'm just lacking in imagination, so I'm > >>>> asking all this to better understand the rationale for init(= ) > >>>> and close(). > >>>> > >>>> > >>>> Maybe I should add some examples. Thanks. > >>>> > >>>> > >>>> Cheers, > >>>> Jeyhun > >>>> > >>>> On Mon, May 22, 2017 at 11:02 AM, Michal Borowiecki > >>>> >>>> > wrote: > >>>> > >>>> Hi Jeyhun, > >>>> > >>>> I'd like to understand better the premise of RichFunctions a= nd > >>>> why |init(Some params)|,| close() |are said to be needed. > >>>> > >>>> To me at least it feels strange that something is called a > >>>> function yet doesn't follow the functional interface > >>>> definition of having just one abstract method. I suppose ini= t > >>>> and close could be made default methods with empty bodies on= ce > >>>> Java 7 support is dropped to mitigate that concern. Still, I > >>>> feel some resistance to consider something that requires > >>>> initialisation and closing (which implies holding state) as > >>>> being a function. Sounds more like the Processor/Transformer > >>>> kind of thing semantically, rather than a function. > >>>> > >>>> The KIP says there are multiple use-cases for this but doesn= 't > >>>> enumerate any - I think some examples would be useful, > >>>> otherwise that section sounds a little bit vague. > >>>> > >>>> IMHO, it's the access to the RecordContext is where the adde= d > >>>> value lies but maybe I'm just lacking in imagination, so I'm > >>>> asking all this to better understand the rationale for init(= ) > >>>> and close(). > >>>> > >>>> Thanks, > >>>> Micha=C5=82 > >>>> > >>>> On 20/05/17 17:05, Jeyhun Karimov wrote: > >>>>> Dear community, > >>>>> > >>>>> As we discussed in KIP-149 [DISCUSS] thread [1], I would li= ke > >> to initiate > >>>>> KIP for rich functions (interfaces) [2]. > >>>>> I would like to get your comments. > >>>>> > >>>>> > >>>>> [1] > >>>>> > >> > http://search-hadoop.com/m/Kafka/uyzND1PMjdk2CslH12?subj=3DRe+DISCUSS+KIP= +149+Enabling+key+access+in+ValueTransformer+ValueMapper+and+ValueJoiner > >>>>> [2] > >>>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-159%3A+Introducing+= Rich+functions+to+Streams > >>>>> > >>>>> > >>>>> Cheers, > >>>>> Jeyhun > >>>> -- > >>>> Michal Borowiecki > >>>> Senior Software Engineer L4 > >>>> T: +44 208 742 1600 <+44%2020%208742%201600> > <+44%2020%208742%201600> > >> > >>>> +44 203 249 8448 <+44%2020%203249%208448> > <+44%2020%203249%208448> > >> > >>>> > >>>> E: michal.borowiecki@openbet.com > >>>> > >>>> W: www.openbet.com > >>>> > >>>> > >>>> OpenBet Ltd > >>>> Chiswick Park Building 9 > >>>> 566 Chiswick High Rd > >>>> London > >>>> W4 5XT > >>>> UK > >>>> > >>>> > >>>> > >>>> > >>>> This message is confidential and intended only for the > >>>> addressee. If you have received this message in error, pleas= e > >>>> immediately notify the postmaster@openbet.com > >>>> and delete it from your > system > >>>> as well as any copies. The content of e-mails as well as > >>>> traffic data may be monitored by OpenBet for employment and > >>>> security purposes. To protect the environment please do not > >>>> print this e-mail unless necessary. OpenBet Ltd. Registered > >>>> Office: Chiswick Park Building 9, 566 Chiswick High Road, > >>>> London, W4 5XT, United Kingdom. A company registered in > >>>> England and Wales. Registered no. 3134634. VAT no. GB9275236= 12 > >>>> > >>> -- > >>> Michal Borowiecki > >>> Senior Software Engineer L4 > >>> T: +44 208 742 1600 <+44%2020%208742%201600> > <+44%2020%208742%201600> > >> > >>> +44 203 249 8448 <+44%2020%203249%208448> > <+44%2020%203249%208448> > >> > >>> > >>> E: michal.borowiecki@openbet.com > >>> > >>> W: www.openbet.com > >>> > >>> > >>> OpenBet Ltd > >>> Chiswick Park Building 9 > >>> 566 Chiswick High Rd > >>> London > >>> W4 5XT > >>> UK > >>> > >>> > >>> > >>> > >>> This message is confidential and intended only for the addressee. > If > >>> you have received this message in error, please immediately notif= y > >>> the postmaster@openbet.com and > >>> delete it from your system as well as any copies. The content of > >>> e-mails as well as traffic data may be monitored by OpenBet for > >>> employment and security purposes. To protect the environment plea= se > >>> do not print this e-mail unless necessary. OpenBet Ltd. Registere= d > >>> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, > W4 > >>> 5XT, United Kingdom. A company registered in England and Wales. > >>> Registered no. 3134634. VAT no. GB927523612 > >>> > >>> -- > >>> -Cheers > >>> > >>> Jeyhun > >> > >> -- > > -Cheers > > > > Jeyhun > > > > -- -Cheers Jeyhun --001a114a9b663dc20a055098df10--