From dev-return-103590-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Fri Apr 26 06:52:40 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id B60CF18064C for ; Fri, 26 Apr 2019 08:52:39 +0200 (CEST) Received: (qmail 83996 invoked by uid 500); 26 Apr 2019 06:52:37 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 83980 invoked by uid 99); 26 Apr 2019 06:52:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2019 06:52:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 10032180C6F for ; Fri, 26 Apr 2019 06:52:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.636 X-Spam-Level: * X-Spam-Status: No, score=1.636 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.164, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id jlvJZQsRQwlk for ; Fri, 26 Apr 2019 06:52:33 +0000 (UTC) Received: from mail-it1-f196.google.com (mail-it1-f196.google.com [209.85.166.196]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E721560E5B for ; Fri, 26 Apr 2019 06:52:32 +0000 (UTC) Received: by mail-it1-f196.google.com with SMTP id k64so4161053itb.5 for ; Thu, 25 Apr 2019 23:52:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Ya+w7OKizKXgPcF5VwLcRP/ALzlrxUZSFfsgcBYB4Fk=; b=Bgpx6yojEAXlLkwZrE4g9JYlE120ls0bVD/aPvqI29q86cKjKaKxziz9G22AuUuV8b m8YsYBPoL217l3iPQ4LOYqVyCFc9I3eqaBkkOAf6ZQTTZCDRx4pKizkCwoZ/6XYA+Ia/ MeArXQgKsxwMCwiLPIKTNlXRD3hpBWGXTnqmDt+6RF7Fs5grHeczUPLKmJsyrZ8lTtFL SeR0i7y125aobNkWSh8O3bIB6sgHNcdEI0UQUy2ZUvUtgUu9HSHvtfTCaaazgofbhjs3 +8ZXINQsJznbgAk8Yi/gYnHgI8vWmS3FkeXAcndU26dCkbstCBoztL4yOHngbvTs9+X0 vGyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Ya+w7OKizKXgPcF5VwLcRP/ALzlrxUZSFfsgcBYB4Fk=; b=jnJJhY98PTvZeeJZRlhBap6vbLIgEZHGycnmw5FwJgvGDnLuvo1mRs+lTQ1V3xkMZG sOLXrxYdpALAapOEZPi20y9uQz1dffTovSHwnLecaplh49OVLND1h7ByVChDFk12H6Tw MCt8KvetKjQnQ5jEUukfrAUMNW8lL507CYmDw1xQ9H3IQ9SszpEp3xaoQdLGg4LBfyCE ktzQ41HoleFoWW6N8gr3A/6SNAbqhP7UJTDv32wi5d16Q8r9AkxtRXaOdtymeB3JvUSe 6v+GZaTt8BI4LwJzlrYuXMBdq1cpNRVG4Bl0ZAlYOZ49ZX3m1v1Hk2HQGcY3Dn/XI0sb eIxQ== X-Gm-Message-State: APjAAAUnGws0xyoCl6YlIg0uOFbxPOIgrhNodEJm6MXdOS/WkJV5Id+W rB4/YrZKwqef2XLZdETpDN0mfRqIIB3J57R+OkTYXQ== X-Google-Smtp-Source: APXvYqyUAlheqr67TxDCUzW2fIqJa3QTEu6JJbI+J5MSYHQlXq+uCizk3aoLB6ZfRGeQN5kSnMnznn2GA1DRSSoGoZs= X-Received: by 2002:a24:1c15:: with SMTP id c21mr7122521itc.24.1556261552062; Thu, 25 Apr 2019 23:52:32 -0700 (PDT) MIME-Version: 1.0 References: <9204a36d-98d2-1305-1985-bb63486e0775@confluent.io> <1040a7e5-c4f4-a11f-b49f-7ec1ea6f7f93@confluent.io> In-Reply-To: From: Patrik Kleindl Date: Fri, 26 Apr 2019 08:52:21 +0200 Message-ID: Subject: Re: [DISCUSS] KIP-451: Make TopologyTestDriver output iterable To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="000000000000fd03950587696095" --000000000000fd03950587696095 Content-Type: text/plain; charset="UTF-8" Hi It's fine by me, I'll consider doing the PR a good exercise ;-) I'll discard it for the moment, if there should be any hold-up for KIP-456 to make it into the 2.3 release then we could still revive this one because it is mostly done already. best regards and thanks to Jukka for his great work! Patrik On Fri, 26 Apr 2019 at 08:19, Matthias J. Sax wrote: > Thanks for the discussion! > > After skipping over KIP-456, I was also wondering what the overlap of > both KIPs are, and if KIP-456 might actually subsume KIP-451. > > If we all agree on this, I would recommend to discard KIP-451 in favor > of KIP-456. > > I'll also follow up on KIP-456 DISCUSS thread to get some input about > the overlap from their point of view. > > > -Matthias > > On 4/25/19 2:35 AM, Patrik Kleindl wrote: > > Hi > > > > As discussed, if the preferred option is to consume the records always I > > will change both methods in KIP-451 accordingly and also switch them to > > return a List. > > This would be a bit redundant with Jukkas proposal in KIP-456 so the > > question is if KIP-451 should be scraped in favor of KIP-456 which has > more > > powerful solution but will also need a bit more changes in tests. > > On the other hand both are useful and wouldn't conflict as far as I can > see. > > > > Any opinions? > > > > best regards > > > > Patrik > > > > On Thu, 25 Apr 2019 at 08:55, Jukka Karvanen < > jukka.karvanen@jukinimi.com> > > wrote: > > > >> Hi, > >> > >> I played around with Patrick's KAFKA-8200 branch and I tested it with > >> combined with my draft version of KIP-456. > >> > >> Some comments: > >> These two version of iterableOutput methods are working now > differently, if > >> you reuse same fetched Iterable object after piping in new inputs. > >> Version without serde will see the new input, but version with serdes > has > >> streamed the converted items already to new list and that's why > >> not seeing the new item. Maybe it is intended to to fetch new Iterable > each > >> time, but the implementation is not mandating it. > >> > >> See example: > >> > >> > https://github.com/jukkakarvanen/kafka/blob/KAFKA-8200withKIP-456/streams/test-utils/src/test/java/org/apache/kafka/streams/TopologyTestDriverIterableTest.java > >> > >> > >> I have a lot of tests where I pipe the list of input and check list of > >> output items, pipe more and check the new list. > >> Now with this Iterable from the beginning is not very usable if you test > >> like this in multiple batches. You need to reiterate same again. > >> > >> In KIP-456 readKeyValuesToList returns List same way > ConsumerRecordFactory > >> and that way this TestInputTopic is accepting List as input. > >> This collection methods in TestOutputTopic are also consuming the > messages. > >> So you can mix the reading individual rows and collections. > >> With List it is also easier to get the number of outputs compared to > >> Iterable. > >> > >> Please, check out also DISCUSSion of KIP.456. I will post there the > link to > >> the current version of implementation and you can see if it fulfill also > >> your need. > >> > >> Jukka > >> > >> > >> la 20. huhtik. 2019 klo 1.11 Patrik Kleindl (pkleindl@gmail.com) > >> kirjoitti: > >> > >>> Hi Matthias > >>> Seems I got a bit ahead of myself. > >>> With option C my aim was a simple alternative which gives back all > output > >>> records that have happened up to this point (and which have not been > >>> removed by calls to readOutput). > >>> Based on that the user can decide how to step through or compare the > >>> records. > >>> > >>> If you see it as more consistent if the new methods removed all > returned > >>> records then this can easily be done. > >>> > >>> But maybe the pick of Iterable was too narrow. > >>> It would probably be a good fit to return a List or just a Collection > >>> > >>> Picking up John's naming suggestion this would make this: > >>> > >>> public Collection> readAllOutput(final > >>> String topic) { > >>> final Collection> outputRecords = > >>> outputRecordsByTopic.get(topic); > >>> if (outputRecords == null) { > >>> return Collections.emptyList(); > >>> } > >>> outputRecordsByTopic.put(topic, new LinkedList<>()); > >>> return outputRecords; > >>> } > >>> > >>> With the semantics the same as readOutput = removing everything. > >>> > >>> Can be changed to a List if you think it matters that a user can query > >>> some index directly. > >>> > >>> What do you think? > >>> > >>> best regards > >>> > >>> Patrik > >>> > >>> > >>> > >>> On Fri, 19 Apr 2019 at 03:04, Matthias J. Sax > >>> wrote: > >>> > >>>> I am not sure if (C) is the best option to pick. > >>>> > >>>> What is the reasoning to suggest (C) over the other options? > >>>> > >>>> It seems that users cannot clear buffered output using option (C). > This > >>>> might it make difficult to write tests. > >>>> > >>>> The original Jira tickets suggest: > >>>> > >>>>> which returns either an iterator or list over the records that are > >>>> currently available in the topic > >>>> > >>>> This implies that the current buffer would be cleared when getting the > >>>> iterator. > >>>> > >>>> Also, from my understanding, the idea of iterating in general, is to > >>>> step through a finite collection of objects/elements. Hence, if > >>>> `hasNext()` returns `false` is will never return `true` later on. > >>>> > >>>> As John mentioned, Java also has support for streams, that offer > >>>> different semantics, that would align with option (C). However, I am > >> not > >>>> sure if this would be the test API to write tests? > >>>> > >>>> Thoughts? > >>>> > >>>> In any way: whatever semantics we pick, the KIP should explain them. > >>>> Atm, this part is missing in the KIP. > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> On 4/18/19 12:47 PM, Patrik Kleindl wrote: > >>>>> Hi John > >>>>> > >>>>> Thanks for your feedback > >>>>> It's C, it does not consume the messages in contrast to the > >> readOutput. > >>>>> Is it a requirement to do so? > >>>>> That's why I picked a different name so the difference is more > >>>> noticeable. > >>>>> I will add that to the JavaDoc. > >>>>> > >>>>> I see your point regarding future changes, that's why I linked > >> KIP-456 > >>>>> where such a method is proposed and would maybe allow to deprecate my > >>>>> version in favor of a bigger solution. > >>>>> > >>>>> Hope that answers your questions > >>>>> > >>>>> best regards > >>>>> Patrik > >>>>> > >>>>> > >>>>> On Thu, 18 Apr 2019 at 19:46, John Roesler > >> wrote: > >>>>> > >>>>>> Hi, Patrik, > >>>>>> > >>>>>> Thanks for this proposal! > >>>>>> > >>>>>> I have one question, which I didn't see addressed by the KIP. > >>> Currently, > >>>>>> when you call `readOutput`, it consumes the result (removes it from > >>> the > >>>>>> test driver's output). Does your proposed method: > >>>>>> A: consume the whole output stream for that topic "atomically" when > >> it > >>>>>> returns the iterable? (i.e., two calls in a row would guarantee the > >>>> second > >>>>>> call is always an empty iterable?) > >>>>>> B: consume each record when we iterate over it? (i.e., this is like > >> a > >>>>>> stream) If this is the case, is the returned object iterable once > >>>> (uncached > >>>>>> stream), or could we iterate over it repeatedly (cached stream)? > >>>>>> C: not consume at all? (i.e., this is a view on the output topic, > >> but > >>> we > >>>>>> need a separate method to consume/clear the output) > >>>>>> D: something else? > >>>>>> > >>>>>> Also, one suggestion: maybe name the method "readAllOutput" or > >>>> something. > >>>>>> Specifically naming it "iterable" makes it awkward if we do want to > >>>> tighten > >>>>>> the return type (e.g., to List) in the future. This is something we > >>> may > >>>>>> actually want to do, if there's an easy way to say, "assert that the > >>>> output > >>>>>> equals [...some literal list...]". > >>>>>> > >>>>>> Thanks again! > >>>>>> -John > >>>>>> > >>>>>> On Wed, Apr 17, 2019 at 4:01 PM Patrik Kleindl > >>>> wrote: > >>>>>> > >>>>>>> Hi all > >>>>>>> > >>>>>>> Unless someone has objections I will start a VOTE thread tomorrow. > >>>>>>> The KIP adds two methods to the TopologyTestDriver and has no > >>> conflicts > >>>>>> for > >>>>>>> existing users. > >>>>>>> PR https://github.com/apache/kafka/pull/6556 is already being > >>>> reviewed. > >>>>>>> > >>>>>>> Side-note: > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver > >>>>>>> will > >>>>>>> provide a much larger solution for the TopologyTestDriver, but is > >>> just > >>>>>>> starting the discussion. > >>>>>>> > >>>>>>> best regards > >>>>>>> > >>>>>>> Patrik > >>>>>>> > >>>>>>> On Thu, 11 Apr 2019 at 22:14, Patrik Kleindl > >>>> wrote: > >>>>>>> > >>>>>>>> Hi Matthias > >>>>>>>> > >>>>>>>> Thanks for the questions. > >>>>>>>> > >>>>>>>> Regarding the return type: > >>>>>>>> Iterable offers the option of being used in a foreach loop > >> directly > >>>> and > >>>>>>> it > >>>>>>>> gives you access to the .iterator method, too. > >>>>>>>> (ref: > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >> > https://www.techiedelight.com/differences-between-iterator-and-iterable-in-java/ > >>>>>>>> ) > >>>>>>>> > >>>>>>>> To return a List object would require an additional conversion > >> and I > >>>>>>> don't see the immediate benefit. > >>>>>>>> > >>>>>>>> Regarding the ordering: > >>>>>>>> outputRecordsByTopic gives back a Queue > >>>>>>>> > >>>>>>>> private final Map>> > >>>>>>> outputRecordsByTopic = new HashMap<>(); > >>>>>>>> > >>>>>>>> which has a LinkedList behind it > >>>>>>>> > >>>>>>>> outputRecordsByTopic.computeIfAbsent(record.topic(), k -> new > >>>>>>> LinkedList<>()).add(record); > >>>>>>>> > >>>>>>>> So the order is handled by the linked list and should not be > >>> modified > >>>>>> by > >>>>>>>> my changes, > >>>>>>>> not even the .stream.map etc. (ref: > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >> > https://stackoverflow.com/questions/30258566/java-stream-map-and-collect-order-of-resulting-container > >>>>>>>> ) > >>>>>>>> > >>>>>>>> > >>>>>>>> Then again, I am open to change it if people have some strong > >>>>>> preference > >>>>>>>> > >>>>>>>> best regards > >>>>>>>> > >>>>>>>> Patrik > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, 11 Apr 2019 at 17:45, Matthias J. Sax < > >>> matthias@confluent.io> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks for the KIP! > >>>>>>>>> > >>>>>>>>> Overall, this makes sense and can simplify testing. > >>>>>>>>> > >>>>>>>>> What I am wondering is, why you suggest to return an `Iterable`? > >>>> Maybe > >>>>>>>>> returning an `Iterator` would make more sense? Or a List? Note > >> that > >>>>>> the > >>>>>>>>> order of emits matters, thus returning a generic `Collection` > >> would > >>>>>> not > >>>>>>>>> seem to be appropriate. > >>>>>>>>> > >>>>>>>>> Can you elaborate on the advantages to use `Iterable` compared to > >>> the > >>>>>>>>> other options? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -Matthias > >>>>>>>>> > >>>>>>>>> On 4/11/19 2:09 AM, Patrik Kleindl wrote: > >>>>>>>>>> Hi everyone, > >>>>>>>>>> > >>>>>>>>>> I would like to start the discussion on this small enhancement > >> of > >>>>>>>>>> the TopologyTestDriver. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable > >>>>>>>>>> > >>>>>>>>>> Pull request is available at > >>>>>>> https://github.com/apache/kafka/pull/6556 > >>>>>>>>>> > >>>>>>>>>> Any feedback is welcome > >>>>>>>>>> > >>>>>>>>>> best regards > >>>>>>>>>> > >>>>>>>>>> Patrik > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > > > > --000000000000fd03950587696095--