Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5DDDD200D68 for ; Thu, 14 Dec 2017 01:07:54 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5C492160C24; Thu, 14 Dec 2017 00:07:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A4B60160C23 for ; Thu, 14 Dec 2017 01:07:52 +0100 (CET) Received: (qmail 13804 invoked by uid 500); 14 Dec 2017 00:07:51 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 13792 invoked by uid 99); 14 Dec 2017 00:07:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Dec 2017 00:07:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9ADF3C6F60 for ; Thu, 14 Dec 2017 00:07:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id wHTi4aRSLoZ3 for ; Thu, 14 Dec 2017 00:07:44 +0000 (UTC) Received: from mail-ot0-f175.google.com (mail-ot0-f175.google.com [74.125.82.175]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 693F75F19C for ; Thu, 14 Dec 2017 00:07:43 +0000 (UTC) Received: by mail-ot0-f175.google.com with SMTP id s4so3533419ote.4 for ; Wed, 13 Dec 2017 16:07:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=AimU5HB1vmvrLGcbE9lTAn0WKYdzRBQO9FN5TRNENfs=; b=ICG5RIEcCBRRO5PhIJ+RFbGpM2Een+Z+6sDbV3fa9QydebFhEC5nczgT3t5Fq1uheW Deb2WSxmFKwACOnRhlLElS/6jtdV2+Y+WqLvRpvjQDwO2+UzPTgbszUHWgyuBobYD/zF qaY64Q2YNtSpKYcq6rLCJ4f0jvtsFqE4lSu5DtAB7NWJMLyQCdsy0mNeOWorzr6eIDXM F+v4gssAUz/LoWDgqv752Eus8AOEuGPICaageSnWeVKqjtb8lcCadNmggii9nQ64zqkL znIXFHyEisptsxDQ7SjXgXPRC9YVPa3I72puiVLtQYNAeuSG/Zi212mTQLEcbcTbOYEG n+ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=AimU5HB1vmvrLGcbE9lTAn0WKYdzRBQO9FN5TRNENfs=; b=VY9Zw+ycVgVhU8jy2m7L+49gbspyPyKmLpijLS+3KYk8P3/pWxdFhQoRuFF+DPPy7b phhGZy2uXpVvr6f57zfP2H6Y+4HD1dDffKSwE8fA5k49Cg80HCDHh2TqPiDD5YNUECvb T8fbMn9XqexCWdERDeO4RcsZWVRep5NT0LZqvBhedlBLzXqovzO7ywh4bI9Li50ZASk9 D+yCnJzqbzFOrnpDWF+HtldWPjnMEJDeMjnamURxB1nIueTBi2KWrKpK884ccu3j2dYZ X1w9DRMslzWESvFfwz7Y593OR4ArVDOF54f0YoBKNong78gg7+LKyOdeLWDhXjGW314O 9xag== X-Gm-Message-State: AKGB3mIlUsL7oGyYo91c7ZtqjhgseEowCkiU5+aTKA5MlgmgeQGTXwWr VwTDhkmomhTqc4gJwkYHPdRRxWlwbP1wkA4KsN835A== X-Google-Smtp-Source: ACJfBosa5Yn3bA+ciyWMiMzX/BiSpxwLMjsMBYY9mJl+2bk4h+A7Fh6yllSL0AY3TaZq6+ZtHeJeAJbeU5T1DRrqwas= X-Received: by 10.157.60.85 with SMTP id j21mr3473852ote.399.1513210062010; Wed, 13 Dec 2017 16:07:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.74.25.84 with HTTP; Wed, 13 Dec 2017 16:07:41 -0800 (PST) In-Reply-To: <8683ba62-21ba-d175-2158-c5db8bbd3ca9@confluent.io> References: <3c92bb5a-a8bc-5c48-46f5-677f1f35b68b@confluent.io> <67749cf4-4afc-442e-f5d4-2fd71e569d65@confluent.io> <1e2311fb-9e86-4094-ebd7-b3db3e2e49f3@confluent.io> <8683ba62-21ba-d175-2158-c5db8bbd3ca9@confluent.io> From: Guozhang Wang Date: Wed, 13 Dec 2017 16:07:41 -0800 Message-ID: Subject: Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce To: "dev@kafka.apache.org" Content-Type: multipart/alternative; boundary="001a11c0222e377ea6056041ac38" archived-at: Thu, 14 Dec 2017 00:07:54 -0000 --001a11c0222e377ea6056041ac38 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Metrics: this is a good point. Note that currently we have two metrics for `skipped-records` on different levels: 1) on the highest level, the thread-level, we have a `skipped-records`, that records all the skipped records due to deserialization errors. 2) on the lower processor-node level, we have a `skippedDueToDeserializationError`, that records the skipped records on that specific source node due to deserialization errors. So you can see that 1) does not cover any other scenarios and can just be thought of as an aggregate of 2) across all the tasks' source nodes. However, there are other places that can cause a record to be dropped, for example: 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be dropped due to window elapsed. 2) KIP-210: records could be dropped on the producer side. 3) records could be dropped during user-customized processing on errors. I think improving the skipped records of all these scenarios itself worth having another KIP; so I'd suggest we do not drag this KIP-210 into this. Guozhang On Wed, Dec 13, 2017 at 3:45 PM, Matthias J. Sax wrote: > One more after thought: should we add a metric for this? We also have a > metric for `skippedDueToDeserializationError-rate` ? > > > -Matthias > > > > On 12/6/17 7:54 AM, Bill Bejeck wrote: > > Thanks for the clearly written KIP, no further comments from my end. > > > > -Bill > > > > On Wed, Dec 6, 2017 at 9:52 AM, Matt Farmer wrote: > > > >> There is already a vote thread for this KIP. I can bump it so that it= =E2=80=99s > >> towards the top of your inbox. > >> > >> With regard to your concerns: > >> > >> 1) We do not have the "ProductionExceptionHandler" interface defined i= n > the > >> wiki page, thought it is sort of clear that it is a one-function > interface > >> with record and exception. Could you add it? > >> > >> > >> It is defined, it=E2=80=99s just not defined using a code snippet. The= KIP > reads as > >> follows: > >> > >> =3D=3D=3D > >> > >> A public interface named ProductionExceptionHandler with a single > method, > >> handle, that has the following signature: > >> > >> - ProductionExceptionHandlerResponse handle(ProducerRecord >> byte[]> record, Exception exception) > >> > >> > >> =3D=3D=3D > >> > >> If you=E2=80=99d like me to add a code snippet illustrating this that= =E2=80=99s simple > for > >> me to do, but it seemed superfluous. > >> > >> 2) A quick question about your example code: where would be the "logge= r" > >> object be created? > >> > >> > >> SLF4J loggers are typically created as a class member in the class. Su= ch > >> as: > >> > >> private Logger logger =3D LoggerFactory.getLogger(HelloWorld.class); > >> > >> I omit that in my implementation examples for brevity. > >> > >> On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangguoz@gmail.com) > >> wrote: > >> > >> Hello Matt, > >> > >> Thanks for writing up the KIP. I made a pass over it and here is a few > >> minor comments. I think you can consider starting a voting thread for > this > >> KIP while addressing them. > >> > >> 1) We do not have the "ProductionExceptionHandler" interface defined i= n > the > >> wiki page, thought it is sort of clear that it is a one-function > interface > >> with record and exception. Could you add it? > >> > >> 2) A quick question about your example code: where would be the "logge= r" > >> object be created? Note that the implementation of this interface have > to > >> give a non-param constructor, or as a static field of the class but in > that > >> case you would not be able to log which instance is throwing this erro= r > (we > >> may have multiple producers within a single instance, even within a > >> thread). Just a reminder to consider in your implementation. > >> > >> > >> Guozhang > >> > >> On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax > >> wrote: > >> > >>> Thanks a lot for the update! Great write-up! Very clearly explained > what > >>> the change will look like! > >>> > >>> Looks good to me. No further comments from my side. > >>> > >>> > >>> -Matthias > >>> > >>> > >>> On 12/5/17 9:14 AM, Matt Farmer wrote: > >>>> I have updated this KIP accordingly. > >>>> > >>>> Can you please take a look and let me know if what I wrote looks > >> correct > >>> to > >>>> you? > >>>> > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>> 210+-+Provide+for+custom+error+handling++when+Kafka+ > >>> Streams+fails+to+produce > >>>> > >>>> Thanks! > >>>> > >>>> Matt > >>>> > >>>> > >>>> On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote: > >>>> > >>>> Hey Matthias, thanks for getting back to me. > >>>> > >>>> That's fine. But if we add it to `test` package, we don't need to ta= lk > >>>> about it in the KIP. `test` is not public API. > >>>> > >>>> Yes, that makes sense. It was in the KIP originally because I was, a= t > >> one > >>>> point, planning on including it. We can remove it now that we=E2=80= =99ve > >> decided > >>> we > >>>> won=E2=80=99t include it in the public API. > >>>> > >>>> Understood. That makes sense. We should explain this clearly in the > KIP > >>>> and maybe log all other following exceptions at DEBUG level? > >>>> > >>>> > >>>> I thought it was clear in the KIP, but I can go back and double chec= k > >> my > >>>> wording and revise it to try and make it clearer. > >>>> > >>>> I=E2=80=99ll take a look at doing more work on the KIP and the Pull = Request > >>>> tomorrow. > >>>> > >>>> Thanks again! > >>>> > >>>> On December 4, 2017 at 5:50:33 PM, Matthias J. Sax ( > >>> matthias@confluent.io) > >>>> wrote: > >>>> > >>>> Hey, > >>>> > >>>> About your questions: > >>>> > >>>>>>> Acknowledged, so is ProducerFencedException the only kind of > >>> exception I > >>>>>>> need to change my behavior on? Or are there other types I need to > >>>> check? Is > >>>>>>> there a comprehensive list somewhere? > >>>> > >>>> I cannot think if any other atm. We should list all fatal exceptions > >> for > >>>> which we don't call the handler and explain why (exception is "globa= l" > >>>> and will affect all other records, too | ProducerFenced is > >> self-healing). > >>>> > >>>> We started to collect and categorize exception here (not completed > >> yet): > >>>> https://cwiki.apache.org/confluence/display/KAFKA/ > >>> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions > >>>> : > >>>> > >>>> This list should be a good starting point though. > >>>> > >>>>> I include it in the test package because I have tests that assert > that > >>> if > >>>>> the record collector impl encounters an Exception and receives a > >>> CONTINUE > >>>>> that it actually does CONTINUE. > >>>> > >>>> That's fine. But if we add it to `test` package, we don't need to ta= lk > >>>> about it in the KIP. `test` is not public API. > >>>> > >>>>> I didn't want to invoke the handler in places where the CONTINUE or > >> FAIL > >>>>> result would just be ignored. Presumably, after a FAIL has been > >>> returned, > >>>>> subsequent exceptions are likely to be repeats or noise from my > >>>>> understanding of the code paths. If this is incorrect we can revisi= t. > >>>> > >>>> Understood. That makes sense. We should explain this clearly in the > KIP > >>>> and maybe log all other following exceptions at DEBUG level? > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> > >>>> On 12/1/17 11:43 AM, Matt Farmer wrote: > >>>>> Bump! It's been three days here and I haven't seen any further > >> feedback. > >>>>> Eager to get this resolved, approved, and merged. =3D) > >>>>> > >>>>> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer wrote: > >>>>> > >>>>>> Hi there, sorry for the delay in responding. Last week had a holid= ay > >>> and > >>>>>> several busy work days in it so I'm just now getting around to > >>>> responding. > >>>>>> > >>>>>>> We would only exclude > >>>>>>> exception Streams can handle itself (like ProducerFencedException= ) > >> -- > >>>>>>> thus, if the handler has code to react to this, it would not be > bad, > >>> as > >>>>>>> this code is just never called. > >>>>>> [...] > >>>>>>> Thus, I am still in favor of not calling the > >>> ProductionExceptionHandler > >>>>>>> for fatal exception. > >>>>>> > >>>>>> Acknowledged, so is ProducerFencedException the only kind of > >> exception > >>> I > >>>>>> need to change my behavior on? Or are there other types I need to > >>> check? > >>>> Is > >>>>>> there a comprehensive list somewhere? > >>>>>> > >>>>>>> About the "always continue" case. Sounds good to me to remove it > >> (not > >>>>>>> sure why we need it in test package?) > >>>>>> > >>>>>> I include it in the test package because I have tests that assert > >> that > >>> if > >>>>>> the record collector impl encounters an Exception and receives a > >>> CONTINUE > >>>>>> that it actually does CONTINUE. > >>>>>> > >>>>>>> What is there reasoning for invoking the handler only for the fir= st > >>>>>>> exception? > >>>>>> > >>>>>> I didn't want to invoke the handler in places where the CONTINUE o= r > >>> FAIL > >>>>>> result would just be ignored. Presumably, after a FAIL has been > >>> returned, > >>>>>> subsequent exceptions are likely to be repeats or noise from my > >>>>>> understanding of the code paths. If this is incorrect we can > revisit. > >>>>>> > >>>>>> Once I get the answers to these questions I can make another pass = on > >>> the > >>>>>> pull request! > >>>>>> > >>>>>> Matt > >>>>>> > >>>>>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax < > >> matthias@confluent.io > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Thanks for following up! > >>>>>>> > >>>>>>> One thought about an older reply from you: > >>>>>>> > >>>>>>>>>>> I strongly disagree here. The purpose of this handler isn't > >> *just* > >>>> to > >>>>>>>>>>> make a decision for streams. There may also be desirable side > >>>>>>> effects that > >>>>>>>>>>> users wish to cause when production exceptions occur. There m= ay > >> be > >>>>>>>>>>> side-effects that they wish to cause when > >> AuthenticationExceptions > >>>>>>> occur, > >>>>>>>>>>> as well. We should still give them the hooks to preform those > >> side > >>>>>>> effects. > >>>>>>> > >>>>>>> And your follow up: > >>>>>>> > >>>>>>>>> - I think I would rather invoke it for all exceptions that coul= d > >>>>>>> occur > >>>>>>>>> from attempting to produce - even those exceptions were returni= ng > >>>>>>> CONTINUE > >>>>>>>>> may not be a good idea (e.g. Authorization exception). Until > there > >>>>>>> is a > >>>>>>>>> different type for exceptions that are totally fatal (for examp= le > >> a > >>>>>>>>> FatalStreamsException or some sort), maintaining a list of > >>>>>>> exceptions that > >>>>>>>>> you can intercept with this handler and exceptions you cannot > >> would > >>>>>>> be > >>>>>>>>> really error-prone and hard to keep correct. > >>>>>>> > >>>>>>> I understand what you are saying, however, consider that Streams > >> needs > >>>>>>> to die for a fatal exception. Thus, if you call the handler for a > >>> fatal > >>>>>>> exception, we would need to ignore the return value and fail -- > this > >>>>>>> seems to be rather intuitive. Furthermore, users can register an > >>>>>>> uncaught-exception-handler and side effects for fatal errors can = be > >>>>>>> triggered there. > >>>>>>> > >>>>>>> Btw: an AuthorizationException is fatal -- not sure what you mean > by > >>> an > >>>>>>> "totally fatal" exception -- there is no superlative to fatal fro= m > >> my > >>>>>>> understanding. > >>>>>>> > >>>>>>> About maintaining a list of exceptions: I don't think this is too > >>> hard, > >>>>>>> and users also don't need to worry about it IMHO. We would only > >>> exclude > >>>>>>> exception Streams can handle itself (like ProducerFencedException= ) > >> -- > >>>>>>> thus, if the handler has code to react to this, it would not be > bad, > >>> as > >>>>>>> this code is just never called. In case that there is an exceptio= n > >>>>>>> Streams could actually handle but we still call the handler (ie, > >> bug), > >>>>>>> there should be no harm either -- the handler needs to return > either > >>>>>>> CONTINUE or FAIL and we would obey. It could only happen, that > >> Streams > >>>>>>> dies---as request by the user(!)---even if Streams could actually > >>> handle > >>>>>>> the error and move on. But this seems to be not a issue from my > >> point > >>> of > >>>>>>> view. > >>>>>>> > >>>>>>> Thus, I am still in favor of not calling the > >>> ProductionExceptionHandler > >>>>>>> for fatal exception. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> About the "always continue" case. Sounds good to me to remove it > >> (not > >>>>>>> sure why we need it in test package?) and to rename the "failing" > >>>>>>> handler to "Default" (even if "default" is less descriptive and I > >>> would > >>>>>>> still prefer "Fail" in the name). > >>>>>>> > >>>>>>> > >>>>>>> Last question: > >>>>>>> > >>>>>>>>> - Continue to *only* invoke it on the first exception that we > >>>>>>>>> encounter (before sendException is set) > >>>>>>> > >>>>>>> > >>>>>>> What is there reasoning for invoking the handler only for the fir= st > >>>>>>> exception? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> On 11/20/17 10:43 AM, Matt Farmer wrote: > >>>>>>>> Alright, here are some updates I'm planning to make after thinki= ng > >> on > >>>>>>> this > >>>>>>>> for awhile: > >>>>>>>> > >>>>>>>> - Given that the "always continue" handler isn't something I'd > >>>>>>> recommend > >>>>>>>> for production use as is, I'm going to move it into the test > >>>>>>> namespace and > >>>>>>>> remove it from mention in the public API. > >>>>>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler" > >> to > >>>>>>>> "DefaultProductionExceptionHandler" > >>>>>>>> - Given that the API for the exception handler involves being > >>>>>>> invoked by > >>>>>>>> streams to make a decision about whether or not to continue =E2= =80=94 I > >>>>>>> think that > >>>>>>>> we should: > >>>>>>>> - Continue to *only* invoke it on the first exception that we > >>>>>>>> encounter (before sendException is set) > >>>>>>>> - Stop invoking it for the self-healing fenced exceptions. > >>>>>>>> - I think I would rather invoke it for all exceptions that could > >>>>>>> occur > >>>>>>>> from attempting to produce - even those exceptions were returnin= g > >>>>>>> CONTINUE > >>>>>>>> may not be a good idea (e.g. Authorization exception). Until the= re > >>>>>>> is a > >>>>>>>> different type for exceptions that are totally fatal (for exampl= e > a > >>>>>>>> FatalStreamsException or some sort), maintaining a list of > >>>>>>> exceptions that > >>>>>>>> you can intercept with this handler and exceptions you cannot > would > >>>>>>> be > >>>>>>>> really error-prone and hard to keep correct. > >>>>>>>> - I'm happy to file a KIP for the creation of this new Exception > >>>>>>> type > >>>>>>>> if there is interest. > >>>>>>>> > >>>>>>>> @ Matthias =E2=80=94 What do you think about the above? > >>>>>>>> > >>>>>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer wrote= : > >>>>>>>> > >>>>>>>>> I responded before reading your code review and didn't see the > bit > >>>>>>> about > >>>>>>>>> how ProducerFencedException is self-healing. This error handlin= g > >>> logic > >>>>>>> is > >>>>>>>>> *quite* confusing to reason about... I think I'm going to sit > down > >>>> with > >>>>>>>>> the code a bit more today, but I'm inclined to think that if th= e > >>>> fenced > >>>>>>>>> exceptions are, indeed, self healing that we still invoke the > >>> handler > >>>>>>> but > >>>>>>>>> ignore its result for those exceptions. > >>>>>>>>> > >>>>>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer > wrote: > >>>>>>>>> > >>>>>>>>>> Hi there, > >>>>>>>>>> > >>>>>>>>>> Following up here... > >>>>>>>>>> > >>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from > the > >>>>>>>>>> handler implementation names -- it sounds "cleaner" to me > without > >>> it. > >>>>>>>>>> > >>>>>>>>>> Let me think on this. I generally prefer expressiveness to > >>>> clean-ness, > >>>>>>>>>> and I think that comes out in the names I chose for things. Th= e > >>> straw > >>>>>>> man > >>>>>>>>>> in my head is the person that tried to substitute in the > >>>>>>> "AlwaysContinue" > >>>>>>>>>> variant and the "Always" is a trigger to really consider wheth= er > >> or > >>>>>>> not > >>>>>>>>>> they *always* want to try to continue. > >>>>>>>>>> > >>>>>>>>>> To be truthful, after some thought, using the "AlwaysContinue" > >>>> variant > >>>>>>>>>> isn't something I'd recommend generally in a production system= . > >>>>>>> Ideally > >>>>>>>>>> these handlers are targeted at handling a specific series of > >>>>>>> exceptions > >>>>>>>>>> that a user wants to ignore, and not ignoring all exceptions. > >> More > >>> on > >>>>>>> this > >>>>>>>>>> in a minute. > >>>>>>>>>> > >>>>>>>>>>> For the first category, it seems to not make sense to call th= e > >>>>>>> handle but > >>>>>>>>>> Streams should always fail. If we follow this design, the KIP > >>> should > >>>>>>> list > >>>>>>>>>> all exceptions for which the handler is not called. > >>>>>>>>>> > >>>>>>>>>> I strongly disagree here. The purpose of this handler isn't > >> *just* > >>> to > >>>>>>>>>> make a decision for streams. There may also be desirable side > >>> effects > >>>>>>> that > >>>>>>>>>> users wish to cause when production exceptions occur. There ma= y > >> be > >>>>>>>>>> side-effects that they wish to cause when > >> AuthenticationExceptions > >>>>>>> occur, > >>>>>>>>>> as well. We should still give them the hooks to preform those > >> side > >>>>>>> effects. > >>>>>>>>>> > >>>>>>>>>> In light of the above, I'm thinking that the > >>>>>>>>>> "AlwaysContinueProductionExceptionHandler" variant should > >>> probably be > >>>>>>>>>> removed from the public API and moved into tests since that's > >> where > >>>>>>> it's > >>>>>>>>>> most useful. The more I think on it, the more I feel that havi= ng > >>> that > >>>>>>> in > >>>>>>>>>> the public API is a landmine. If you agree, then, we can renam= e > >> the > >>>>>>>>>> "AlwaysFailProductionExceptionHandler" to > >>>>>>>>>> "DefaultProductionExceptionHandler". > >>>>>>>>>> > >>>>>>>>>> Thoughts? > >>>>>>>>>> > >>>>>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax < > >>>>>>> matthias@confluent.io> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> I just review the PR, and there is one thing we should discus= s. > >>>>>>>>>>> > >>>>>>>>>>> There are different types of exceptions that could occur. Som= e > >>> apply > >>>>>>> to > >>>>>>>>>>> all records (like Authorization exception) while others are f= or > >>>>>>> single > >>>>>>>>>>> records only (like record too large). > >>>>>>>>>>> > >>>>>>>>>>> For the first category, it seems to not make sense to call th= e > >>>> handle > >>>>>>>>>>> but Streams should always fail. If we follow this design, the > >> KIP > >>>>>>> should > >>>>>>>>>>> list all exceptions for which the handler is not called. > >>>>>>>>>>> > >>>>>>>>>>> WDYT? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -Matthias > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote: > >>>>>>>>>>>> Just catching up on this KIP. > >>>>>>>>>>>> > >>>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from > >> the > >>>>>>>>>>> handler > >>>>>>>>>>>> implementation names -- it sounds "cleaner" to me without it= . > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -Matthias > >>>>>>>>>>>> > >>>>>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote: > >>>>>>>>>>>>> It is agreed, then. I've updated the pull request. I'm tryi= ng > >> to > >>>>>>> also > >>>>>>>>>>>>> update the KIP accordingly, but cwiki is being slow and > >> dropping > >>>>>>>>>>>>> connections..... I'll try again a bit later but please > >> consider > >>>> the > >>>>>>>>>>> KIP > >>>>>>>>>>>>> updated for all intents and purposes. heh. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang < > >>> wangguoz@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> That makes sense. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer > >>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many > >> times > >>>> by > >>>>>>>>>>>>>>> unintentionally shipping code that fails to properly > >> implement > >>>>>>>>>>> logging. I > >>>>>>>>>>>>>>> always discover this at the exact *worst* moment, too. > >>> (Normally > >>>>>>> at > >>>>>>>>>>> 3 AM > >>>>>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel th= e > >>> same > >>>>>>> way > >>>>>>>>>>> I > >>>>>>>>>>>>>> could > >>>>>>>>>>>>>>> probably be convinced to remove it. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> We could also meet halfway and say that when a customized > >>>>>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, > we > >>> log > >>>>>>> at > >>>>>>>>>>> DEBUG > >>>>>>>>>>>>>>> level instead of WARN level. Would that alternative be > >>> appealing > >>>>>>> to > >>>>>>>>>>> you? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang < > >>>>>>> wangguoz@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki agai= n > >> and > >>>> it > >>>>>>>>>>> looks > >>>>>>>>>>>>>>>> good. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> About whether record collector should still internally l= og > >>> the > >>>>>>>>>>> error in > >>>>>>>>>>>>>>>> addition to what the customized ProductionExceptionHandl= er > >>>>>>> does. I > >>>>>>>>>>>>>>>> personally would prefer only to log if the returned valu= e > >> is > >>>>>>> FAIL > >>>>>>>>>>> to > >>>>>>>>>>>>>>>> indicate that this thread is going to shutdown and trigg= er > >>> the > >>>>>>>>>>>>>> exception > >>>>>>>>>>>>>>>> handler. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer > > >>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updat= ed > >>> this > >>>>>>>>>>> KIP as > >>>>>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review > >>> again! > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer > > >>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and > forgot > >>> to > >>>>>>>>>>>>>> mention > >>>>>>>>>>>>>>>> them > >>>>>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the > >> oversight. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I=E2=80=99ll revise the KIP this afternoon accordingly= . > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> The logging is actually provided for in the record > >>> collector. > >>>>>>>>>>>>>>> Whenever > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> handler continues it=E2=80=99ll log a warning to ensur= e that > it=E2=80=99s > >>>>>>>>>>>>>>> *impossible* > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> write a handler that totally suppresses production > >>> exceptions > >>>>>>>>>>> from > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> log. > >>>>>>>>>>>>>>>>>> As such, the default continue handler just returns the > >>>>>>> continue > >>>>>>>>>>>>>>> value. > >>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>> can add details about those semantics to the KIP as > well. > >>>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax < > >>>>>>>>>>>>>>> matthias@confluent.io > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> One more comment. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> You mention a default implementation for the handler > >> that > >>>>>>>>>>> fails. I > >>>>>>>>>>>>>>>>>>> think, this should be part of the public API and thus > >>> should > >>>>>>>>>>> have > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> We could also add a second implementation for the > >>>>>>>>>>> log-and-move-on > >>>>>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This > >>> class > >>>>>>>>>>> should > >>>>>>>>>>>>>>>> also > >>>>>>>>>>>>>>>>>>> be part of public API (so users can just set in the > >>> config) > >>>>>>>>>>> with a > >>>>>>>>>>>>>>>>>>> proper name. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote: > >>>>>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are > >>>>>>> compatible > >>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens > >>>> whenever > >>>>>>>>>>> one > >>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> PRs > >>>>>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR i= f > >>> 4148 > >>>>>>> goes > >>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>> first. :) > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang < > >>>>>>>>>>>>>> wangguoz@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> As for the implementation, please note that there i= s > >>>>>>> another > >>>>>>>>>>>>>>>> ongoing > >>>>>>>>>>>>>>>>> PR > >>>>>>>>>>>>>>>>>>>>> that may touch the same classes that you are workin= g > >> on: > >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that = PR > >>> and > >>>>>>> see > >>>>>>>>>>>>>> if > >>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>> compatible with your changes. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer < > >>>>>>> matt@frmr.me> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP > as > >>>>>>>>>>>>>> currently > >>>>>>>>>>>>>>>>>>> written: > >>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It stil= l > >>> needs > >>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> tests > >>>>>>>>>>>>>>>>>>>>>> added, > >>>>>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> If there are more points that folks would like to > >>>> discuss, > >>>>>>>>>>>>>>> please > >>>>>>>>>>>>>>>>> let > >>>>>>>>>>>>>>>>>>> me > >>>>>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afterno= on > >>> I'll > >>>>>>>>>>>>>>> probably > >>>>>>>>>>>>>>>>>>> start > >>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>> [VOTE] thread. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>> Matt > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer < > >>> matt@frmr.me > >>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I can=E2=80=99t think of a reason that would be > problematic. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like thi= s, > >> I > >>>>>>> either > >>>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything dow= n > >> so > >>>>>>> that I > >>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>> spin > >>>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. Wh= en > >> we > >>>>>>> start > >>>>>>>>>>>>>> up > >>>>>>>>>>>>>>>>> after > >>>>>>>>>>>>>>>>>>>>>>> crashing we=E2=80=99ll eventually try to process = the > message > >>> we > >>>>>>>>>>>>>> failed > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>> produce > >>>>>>>>>>>>>>>>>>>>>>> again. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I=E2=80=99m concerned that =E2=80=9Cputting in a = queue for later=E2=80=9D > >>> opens > >>>>>>> you > >>>>>>>>>>>>>> up > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>> putting > >>>>>>>>>>>>>>>>>>>>>>> messages into the destination topic in an > unexpected > >>>>>>> order. > >>>>>>>>>>>>>>>> However > >>>>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>>>>>>> others feel differently, I=E2=80=99m happy to tal= k about > it. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang < > >>>>>>>>>>>>>>>> wangguoz@gmail.com> > >>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my > >> understanding > >>>> is > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> record > >>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurre= d > >>> while > >>>>>>>>>>>>>> trying > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>> produce. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> That is right. Thanks. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I > >>> realized > >>>>>>> that > >>>>>>>>>>>>>>>> since > >>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, = we > >>> may > >>>>>>> not > >>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>> implement > >>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queu= e > >> for > >>>>>>>>>>> future > >>>>>>>>>>>>>>>>>>>>> processing. > >>>>>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue? > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer < > >>>>>>>>>>> matt@frmr.me > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Hello all, > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation= , > >> and > >>>>>>> made > >>>>>>>>>>>>>> it > >>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>> its > >>>>>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line = up > >>> more > >>>>>>>>>>>>>>> closely > >>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler). > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I > >> need > >>> to > >>>>>>>>>>>>>> reply > >>>>>>>>>>>>>>>>>>> to..... > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameter= s > >> are > >>>>>>>>>>> around > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> record, > >>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord`, > or > >>> be > >>>>>>>>>>>>>>> generics > >>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord` or > >>>>>>>>>>>>>>>> `ProducerRecord >>>>>>>>>>>>>>>>>>>>>>>> extends > >>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`? > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that > >> this > >>>>>>> is a > >>>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord, so the generics > >>> would > >>>>>>> just > >>>>>>>>>>>>>>> make > >>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>>>> harder > >>>>>>>>>>>>>>>>>>>>>>>>> to work with the key and value. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the > >>>>>>>>>>>>>>> `RecordMetadata` > >>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>> well > >>>>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null? > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my > >> understanding > >>>> is > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> record > >>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurre= d > >>> while > >>>>>>>>>>>>>> trying > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>> produce. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the > >>>>>>> following > >>>>>>>>>>>>>>>>> handling > >>>>>>>>>>>>>>>>>>>>>>>> logic > >>>>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me kno= w > >>> what > >>>>>>> you > >>>>>>>>>>>>>>>> think. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>>>>> Matt > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer < > >>>>>>> matt@frmr.me > >>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I=E2=80=99m at a con= ference > >> right > >>>>>>> now > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>> am > >>>>>>>>>>>>>>>>>>>>>>>> planning > >>>>>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this > >>>>>>>>>>> conversation > >>>>>>>>>>>>>>>> later > >>>>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>> week. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I=E2=80=99ll shoot you a more detailed respons= e then! :) > >>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang = < > >>>>>>>>>>>>>>>>> wangguoz@gmail.com > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of > >>>>>>>>>>>>>>>>> ProductionExceptionHandlerResp > >>>>>>>>>>>>>>>>>>>>>> onse, > >>>>>>>>>>>>>>>>>>>>>>>>>>> could > >>>>>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the > >>> actual > >>>>>>>>>>>>>> added > >>>>>>>>>>>>>>>>> config > >>>>>>>>>>>>>>>>>>>>>>>> names > >>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g. what > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP= - > >>>>>>>>>>>>>>> 161%3A+streams+ > >>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers > >>>>>>>>>>>>>>>>>>>>>>>>>>> described). > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle paramete= rs > >>> are > >>>>>>>>>>>>>> around > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> record, > >>>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord`= , > >> or > >>> be > >>>>>>>>>>>>>>> generics > >>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord` or > >>>>>>>>>>>>>>>> `ProducerRecord >>>>>>>>>>>>>>>>>>>>>>>> extends > >>>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`? > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the > >>>>>>>>>>>>>>> `RecordMetadata` > >>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>>>> well in > >>>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null? > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least th= e > >>>>>>> following > >>>>>>>>>>>>>>>>> handling > >>>>>>>>>>>>>>>>>>>>>>>> logic > >>>>>>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) > >>> throw > >>>>>>>>>>>>>>> exception > >>>>>>>>>>>>>>>>>>>>>>>>> immediately > >>>>>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the > >> error > >>>> and > >>>>>>>>>>>>>> drop > >>>>>>>>>>>>>>>>> record > >>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a > >>> specific > >>>>>>>>>>>>>> "error" > >>>>>>>>>>>>>>>>> Kafka > >>>>>>>>>>>>>>>>>>>>>>>> topic, > >>>>>>>>>>>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics ( > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>> https://kafka.apache.org/documentation/#kafka_streams_ > >>>>>>>>>>>>>>>>> monitoring > >>>>>>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>> monitoring. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer = < > >>>>>>>>>>>>>> matt@frmr.me > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization > handler > >>>> uses > >>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" > >> for > >>> the > >>>>>>>>>>>>>>> config > >>>>>>>>>>>>>>>>>>>>>> name. No > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think > this > >>>>>>> should > >>>>>>>>>>>>>> use > >>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler". > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer = < > >>>>>>>>>>>>>> matt@frmr.me > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized > >> record > >>> is > >>>>>>>>>>>>>>>> possible, > >>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>>> changing > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly > >> doable. > >>>>>>>>>>>>>> However, > >>>>>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>>>>>>> we're > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we > have > >>> no > >>>>>>>>>>>>>>> guarantee > >>>>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by th= e > >>> time > >>>>>>> this > >>>>>>>>>>>>>>>>>>>>>> particular > >>>>>>>>>>>>>>>>>>>>>>>>>>> handler > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change > to > >>>>>>>>>>>>>> something > >>>>>>>>>>>>>>>>>>>>> like: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse > >> handle(final > >>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<..> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> record, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer= < > >>>>>>>>>>>>>>> matt@frmr.me> > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the nami= ng > >>> and > >>>>>>>>>>>>>> return > >>>>>>>>>>>>>>>> type > >>>>>>>>>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>>>>>>>>>>> I=E2=80=99m > >>>>>>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context > and > >>> the > >>>>>>>>>>>>>> record > >>>>>>>>>>>>>>>>>>>>>> itself > >>>>>>>>>>>>>>>>>>>>>>>>>>> without > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other changes. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J= . > >>> Sax < > >>>>>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP= - > >>>>>>>>>>>>>>> 161%3A+streams+ > >>>>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design > >>> (parameter > >>>>>>>>>>>>>> naming, > >>>>>>>>>>>>>>>>>>>>>> return > >>>>>>>>>>>>>>>>>>>>>>>>>>> types, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to > >> get > >>> a > >>>>>>>>>>>>>> unified > >>>>>>>>>>>>>>>>>>>>> user > >>>>>>>>>>>>>>>>>>>>>>>>>>>> experience. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I=E2=80=99ll create the JIRA ticket. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I=E2= =80=99ll > >>> update > >>>>>>> the > >>>>>>>>>>>>>>> KIP > >>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu < > >>>>>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to > >> the > >>>>>>> KIP ? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name > >>> should > >>>>>>> be > >>>>>>>>>>>>>>>>>>>>>> specified. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt > >>> Farmer < > >>>>>>>>>>>>>>>>>>>>>> matt@frmr.me> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the > >> KIP > >>>>>>> that I > >>>>>>>>>>>>>>> just > >>>>>>>>>>>>>>>>>>>>>> filed > >>>>>>>>>>>>>>>>>>>>>>>>>>> here: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+ > >>>>>>>>>>>>>>> error+handling++when+Kafka+ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some > feedback > >>> from > >>>>>>>>>>>>>> folks > >>>>>>>>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>> idea > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can > >>> contribute > >>>>>>> back. > >>>>>>>>>>>>>> :) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>> > >>> > >>> > >> > >> > >> -- > >> -- Guozhang > >> > > > > --=20 -- Guozhang --001a11c0222e377ea6056041ac38--