Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BEE91200D1A for ; Mon, 9 Oct 2017 09:55:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BD3B81609E0; Mon, 9 Oct 2017 07:55:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B38CF1609BB for ; Mon, 9 Oct 2017 09:55:02 +0200 (CEST) Received: (qmail 98010 invoked by uid 500); 9 Oct 2017 07:55:01 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 97998 invoked by uid 99); 9 Oct 2017 07:55:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Oct 2017 07:55:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4E81DD55A7 for ; Mon, 9 Oct 2017 07:55:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id eJfXoy3Xk7tD for ; Mon, 9 Oct 2017 07:54:57 +0000 (UTC) Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D4BA360E11 for ; Mon, 9 Oct 2017 07:54:56 +0000 (UTC) Received: by mail-lf0-f46.google.com with SMTP id d17so26327786lfe.2 for ; Mon, 09 Oct 2017 00:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=kTGsp06NJqCa/nB7uMAGLk9hPn7hvZ2ee50OSlBSuFA=; b=g6rD/rMil+4OPKk6p40KazuLKJClVM12fEL1+MkytmtvtkwsyZDPbmXMT2SrpHoGOy O1QR/kMVACurprx6TrKU8eRkhhhynKfjmfEqSVi7RFv7c6qNBy472RS7b0abMfszMK9M UQgAS1CqDbZE5cHtL8T8+ffjALZPN0FKqyQ/Vv3sHBpcJW+9L/WKY1dr99Kpe01u9KAB YU8GR4bWLmPasVBxwNzbdeyQLOdc25A/n9fBL70aiAgqTi6aBDVAAm5JUb1G03s0C+7a wvyXylCHZ4SUAuUvr5gDgKcJqPdQl3Sys/98kJan7hKEOKfi3He+ofcAO4V49UOfM3Po Qa4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=kTGsp06NJqCa/nB7uMAGLk9hPn7hvZ2ee50OSlBSuFA=; b=Tdp/aC0KLM4sXqt1Y+NBVg29G+a5n3t5+3LVd/8bMEH22h01zmhaAIZSbcbIihFg6a Nzy/de+o/mwhdMqZ9oENDUC02owNkRCAWe54bNfU/uK+x+HjzU3BB4f543kruQDoBwbC QJ1q7+jVBzgaptN7qMv7zGOwBYNFTnrBTQ8nkjjdEednaKZoUcVNtZRzIHCbGSmBDAVO swDM85phkyGO+W0e2vOav3Zt5ElTOps67cWhZCWCveVPN1814/+rC2ACnjqwWUZkp+xS /GX7sggHiHjcgpf1nrWsr+qoaEcg5JmCeimv5OxCRjqY2heHw9QNWNVAMLzF3i6pQxpc IHOw== X-Gm-Message-State: AMCzsaXRvegk0W3NgSIEaQDmLN4wIBeKBMEOecuhaVM7l3tJzxvFiZnN tFEBsq+ywc/ajGvKmFmXjWL9NM+WP1MWme/n2uw= X-Google-Smtp-Source: AOwi7QC4ileL8oYlf2AK1mmNy/bx8gulJ9sOQcRkXxiOSq1Uy7kGm8iCCqhidDA77W+r1pRuy4i7cz59qaf45TgkDO4= X-Received: by 10.25.24.72 with SMTP id o69mr3281877lfi.164.1507535696029; Mon, 09 Oct 2017 00:54:56 -0700 (PDT) MIME-Version: 1.0 References: <6bfcde95-6dd1-ae08-9fa1-f33a52fda255@confluent.io> <6cb5a2f3-bb56-5a0e-0eae-1cd4ab625acb@confluent.io> <40b8f9e3-5445-89fb-32de-568954e30323@confluent.io> In-Reply-To: <40b8f9e3-5445-89fb-32de-568954e30323@confluent.io> From: Jorge Esteban Quilcate Otoya Date: Mon, 09 Oct 2017 07:54:45 +0000 Message-ID: Subject: Re: [DISCUSS] KIP-171: Extend Consumer Group Reset Offset for Stream Application To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="001a113fae5aa5edab055b18812d" archived-at: Mon, 09 Oct 2017 07:55:03 -0000 --001a113fae5aa5edab055b18812d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Matthias, Thanks for the heads up! I think the main dependency is from `StreamResseter` to `ConsumerGroupCommand` class to actually reuse `#reset-offsets` functionality. Not sure what would be the better way to remove it. To expose commands (e.g. `ConsumerGroupCommand`) as part of AdminClient, they have to be re-implemented on the `client` module right? Is this an option? If not I think we should keep `StreamResseter` as part of `core` module until we have `ConsumerGroupCommand` on `client` module as well. El vie., 6 oct. 2017 a las 0:05, Matthias J. Sax () escribi=C3=B3: > Jorge, > > KIP-198 (that got merged already) overlaps with this KIP. Can you please > update your KIP accordingly? > > Also, while working on KIP-198, we identified some shortcomings in > AdminClient that do not allow us to move StreamsResetter our of core > package. We want to address those shortcoming in another KIP to add > missing functionality to the new AdminClient. > > Having say this, and remembering a discussion about dependencies that > might be introduced by this KIP, it might be good to understand those > dependencies in detail. Maybe we can resolve those dependencies somehow > and thus, be able to more StreamsResetter out of core package. Could you > summarize those dependencies in the KIP or just as a reply? > > Thanks! > > > -Matthias > > On 9/11/17 3:02 PM, Jorge Esteban Quilcate Otoya wrote: > > Thanks Guozhang! > > > > I have updated the KIP to: > > > > 1. Only one scenario param is allowed. If none, `to-earliest` will be > used, > > behaving as the current version. > > > > 2. > > 1. An exception will be printed mentioning that there is no existing > > offsets registered. > > 2. inputTopics format could support define partition numbers as in > > reset-offsets option for kafka-consumer-groups. > > > > 3. That should be handled by KIP-198. > > > > I will start the VOTE thread in a following email. > > > > > > El mi=C3=A9., 30 ago. 2017 a las 2:01, Guozhang Wang () > > escribi=C3=B3: > > > >> Hi Jorge, > >> > >> Thanks for the KIP. It would be a great to add feature to the reset > tools. > >> I made a pass over it and it looks good to me overall. I have a few > >> comments: > >> > >> 1. For all the scenarios, do we allow users to specify more than one > >> parameters? If not could you make that clear in the wiki, e.g. we woul= d > >> return with an error message saying that only one is allowed; if yes > then > >> what precedence order we are following? > >> > >> 2. Personally I feel that "--by-duration", "--to-offset" and > "--shift-by" > >> are a tad overkill, because 1) they assume there exist some committed > >> offset for each of the topic, but that may not be always true, 2) > offset / > >> time shifting amount on different topics may not be a good fit > universally, > >> i.e. one could imagine the we want to reset all input topics to their > >> offsets of a given time, but resetting all topics' offset to the same > value > >> or let all of them shifting the same amount of offsets are usually not > >> applicable. For "--by-duration" it seems could be easily supported by > the > >> "to-date". > >> > >> For the general consumer group reset tool, since it could be set one p= er > >> partition these parameters may be more useful. > >> > >> 3. As for the implementation details, when removing zookeeper config i= n > >> `kafka-streams-application-reset`, we should consider return a meaning > >> error message otherwise it would be "unrecognized config" blah. > >> > >> > >> If you feel confident about the wiki after discussing about these > points, > >> please feel free to move on to start a voting thread. Note that we are > >> about 3 weeks away from KIP deadline and 4 weeks away from feature > >> deadline. > >> > >> > >> Guozhang > >> > >> > >> > >> > >> > >> On Tue, Aug 22, 2017 at 1:45 PM, Matthias J. Sax > > >> wrote: > >> > >>> Thanks for the update Jorge. > >>> > >>> I don't have any further comments. > >>> > >>> > >>> -Matthias > >>> > >>> On 8/12/17 6:43 PM, Jorge Esteban Quilcate Otoya wrote: > >>>> I have updated the KIP: > >>>> > >>>> - Change execution parameters, using `--dry-run` > >>>> - Reference KAFKA-4327 > >>>> - And advise about changes on `StreamResetter` > >>>> > >>>> Also includes that it will cover a change on `ConsumerGroupCommand` = to > >>>> align execution options. > >>>> > >>>> El dom., 16 jul. 2017 a las 5:37, Matthias J. Sax (< > >>> matthias@confluent.io>) > >>>> escribi=C3=B3: > >>>> > >>>>> Thanks a lot for the update! > >>>>> > >>>>> I like the KIP! > >>>>> > >>>>> One more question about `--dry-run` vs `--execute`: While I agree > that > >>>>> we should use the same flag for both tools, I am not sure which one > is > >>>>> the better one... My personal take is, that I like `--dry-run` > better. > >>>>> Not sure what others think. > >>>>> > >>>>> One more comment: with the removal of ZK, we can also tackle this > >> JIRA: > >>>>> https://issues.apache.org/jira/browse/KAFKA-4327 If we do so, I > think > >>> we > >>>>> should mention it in the KIP. > >>>>> > >>>>> I am also not sure about backward compatibility issue for this case= . > >>>>> Actually, I don't expect people to call `StreamsResetter` from Java > >>>>> code, but you can never know. So if we break this, we need to make > >> sure > >>>>> to cover it in the KIP and later on in the release notes. > >>>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>>> On 7/14/17 7:15 AM, Jorge Esteban Quilcate Otoya wrote: > >>>>>> Hi, > >>>>>> > >>>>>> KIP is updated. > >>>>>> Changes: > >>>>>> 1. Approach discussed to keep both tools (streams application > >> resetter > >>>>> and > >>>>>> consumer group reset offset). > >>>>>> 2. Options has been aligned between both tools. > >>>>>> 3. Zookeeper option from streams-application-resetted will be > >> removed, > >>>>> and > >>>>>> replaced internally for Kafka AdminClient. > >>>>>> > >>>>>> Looking forward to your feedback. > >>>>>> > >>>>>> El jue., 29 jun. 2017 a las 15:04, Matthias J. Sax (< > >>>>> matthias@confluent.io>) > >>>>>> escribi=C3=B3: > >>>>>> > >>>>>>> Damian, > >>>>>>> > >>>>>>>> resets everything and clears up > >>>>>>>>> the state stores. > >>>>>>> > >>>>>>> That's not correct. The reset tool does not touch local store. Fo= r > >>> this, > >>>>>>> we have `KafkaStreams#clenup` -- otherwise, you would need to run > >> the > >>>>>>> tool in every machine you run an application instance. > >>>>>>> > >>>>>>> With regard to state, the tool only deletes the underlying > changelog > >>>>>>> topics. > >>>>>>> > >>>>>>> Just to clarify. The idea of allowing to rest with different offs= et > >> is > >>>>>>> to clear all state but just use a different start offset (instead > of > >>>>>>> zero). This is for use case where your topic has a larger retenti= on > >>> time > >>>>>>> than the amount of data you want to reprocess. But we always need > to > >>>>>>> start with an empty state. (Resetting with consistent state is > >>> something > >>>>>>> we might do at some point, but it's much hard and not part of thi= s > >>> KIP) > >>>>>>> > >>>>>>>> @matthias, could we remove the ZK dependency from the streams > reset > >>>>> tool > >>>>>>>> now? > >>>>>>> > >>>>>>> I think so. The new AdminClient provide the feature we need AFAIK= . > I > >>>>>>> guess we can picky back this into the KIP (we would need a KIP > >> anyway > >>>>>>> because we deprecate `--zookeeper` what is an public API change). > >>>>>>> > >>>>>>> > >>>>>>> I just want to point out, that even without ZK dependency, I pref= er > >> to > >>>>>>> wrap the consumer offset tool and keep two tools. > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> > >>>>>>> On 6/29/17 9:14 AM, Damian Guy wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Thanks for the KIP. What is not clear is how is this going to > >> handle > >>>>>>> state > >>>>>>>> stores? Right now the streams reset tool, resets everything and > >>> clears > >>>>> up > >>>>>>>> the state stores. What are we going to do if we reset to a > >> particular > >>>>>>>> offset? If we clear the state then we've lost any previously > >>> aggregated > >>>>>>>> values (which may or may not be what is expected). If we don't > >> clear > >>>>>>> them, > >>>>>>>> then we will end up with incorrect aggregates. > >>>>>>>> > >>>>>>>> @matthias, could we remove the ZK dependency from the streams > reset > >>>>> tool > >>>>>>>> now? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Damian > >>>>>>>> > >>>>>>>> On Thu, 29 Jun 2017 at 15:22 Jorge Esteban Quilcate Otoya < > >>>>>>>> quilcate.jorge@gmail.com> wrote: > >>>>>>>> > >>>>>>>>> You're right, I was not considering Zookeeper dependency. > >>>>>>>>> > >>>>>>>>> I'm starting to like the idea to wrap `reset-offset` from > >>>>>>>>> `streams-application-reset`. > >>>>>>>>> > >>>>>>>>> I will wait a bit for more feedback and work on a draft with th= is > >>>>>>> changes. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> El mi=C3=A9., 28 jun. 2017 a las 0:20, Matthias J. Sax (< > >>>>>>> matthias@confluent.io > >>>>>>>>>> ) > >>>>>>>>> escribi=C3=B3: > >>>>>>>>> > >>>>>>>>>> I agree, that we should not duplicate functionality. > >>>>>>>>>> > >>>>>>>>>> However, I am worried, that a non-streams users using the offs= et > >>>>> reset > >>>>>>>>>> tool might delete topics unintentionally (even if the changes > are > >>>>>>> pretty > >>>>>>>>>> small). Also, currently the stream reset tool required Zookeep= er > >>>>> while > >>>>>>>>>> the offset reset tool does not -- I don't think we should add > >> this > >>>>>>>>>> dependency to the offset reset tool. > >>>>>>>>>> > >>>>>>>>>> Thus, it think it might be better to keep both tools, but > >>> internally > >>>>>>>>>> rewrite the streams reset entry class, to reuse as much as > >> possible > >>>>>>> from > >>>>>>>>>> the offset reset tool. Ie. the streams tool would be a wrapper > >>> around > >>>>>>>>>> the offset tool and add some functionality it needs that is > >> Streams > >>>>>>>>>> specific. > >>>>>>>>>> > >>>>>>>>>> I also think, that keeping separate tools for consumers and > >> streams > >>>>> is > >>>>>>> a > >>>>>>>>>> good thing. We might want to add new features that don't apply > to > >>>>> plain > >>>>>>>>>> consumers -- note, a Streams applications is more than just a > >>> client. > >>>>>>>>>> > >>>>>>>>>> WDYT? > >>>>>>>>>> > >>>>>>>>>> Would be good to get some feedback from others, too. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Matthias > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 6/27/17 9:05 AM, Jorge Esteban Quilcate Otoya wrote: > >>>>>>>>>>> Thanks for the feedback Matthias! > >>>>>>>>>>> > >>>>>>>>>>> The main idea is to use only 1 tool to reset offsets and don'= t > >>>>>>>>> replicate > >>>>>>>>>>> functionality between tools. > >>>>>>>>>>> Reset Offset (KIP-122) tool not only reset but support execut= e > >> the > >>>>>>>>> reset > >>>>>>>>>>> but also export, import from files, etc. > >>>>>>>>>>> If we extend the current tool (kafka-streams-application- > >>> reset.sh) > >>>>> we > >>>>>>>>>> will > >>>>>>>>>>> have to duplicate all this functionality also. > >>>>>>>>>>> Maybe another option is to move the current implementation in= to > >>>>>>>>>>> `kafka-consumer-group` and add a new command > >>>>> `--reset-offset-streams` > >>>>>>>>>> with > >>>>>>>>>>> the current implementation functionality and add > >> `--reset-offset` > >>>>>>>>> options > >>>>>>>>>>> for input topics. Does this make sense? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> El lun., 26 jun. 2017 a las 23:32, Matthias J. Sax (< > >>>>>>>>>> matthias@confluent.io>) > >>>>>>>>>>> escribi=C3=B3: > >>>>>>>>>>> > >>>>>>>>>>>> Jorge, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks a lot for this KIP. Allowing the reset streams > >>> applications > >>>>>>>>> with > >>>>>>>>>>>> arbitrary start offset is something we got multiple requests > >>>>> already. > >>>>>>>>>>>> > >>>>>>>>>>>> Couple of clarification question: > >>>>>>>>>>>> > >>>>>>>>>>>> - why do you want to deprecate the current tool instead of > >>>>> extending > >>>>>>>>>>>> the current tool with the stuff the offset reset tool can do > >> (ie, > >>>>> use > >>>>>>>>>>>> the offset reset tool internally) > >>>>>>>>>>>> > >>>>>>>>>>>> - you suggest to extend the offset reset tool to replace th= e > >>>>> stream > >>>>>>>>>>>> reset tool: how would the reset tool know if it is resetting= a > >>>>>>> streams > >>>>>>>>>>>> applications or a regular consumer group? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -Matthias > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 6/26/17 1:28 PM, Jorge Esteban Quilcate Otoya wrote: > >>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'd like to start the discussion to add reset offset toolin= g > >> for > >>>>>>>>> Stream > >>>>>>>>>>>>> applications. > >>>>>>>>>>>>> The KIP can be found here: > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>> 171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Jorge. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > >> > >> -- > >> -- Guozhang > >> > > > > --001a113fae5aa5edab055b18812d--