kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: [DISCUSS] KIP-478 Strongly Typed Processor API
Date Wed, 24 Jul 2019 16:11:21 GMT
Sounds good to me, thanks John!


Guozhang

On Wed, Jul 24, 2019 at 7:40 AM John Roesler <john@confluent.io> wrote:

> Hey Guozhang,
>
> Thanks for the thought! It sounds related to what I was thinking in
> https://issues.apache.org/jira/browse/KAFKA-8396 , but a little "extra"...
>
> I proposed to eliminate ValueTransformer, but I believe you're right; we
> could eliminate Transformer also and just use Processor in the transform()
> methods.
>
> To your first bullet, regarding transform/flatTransform... I'd argue that
> the difference isn't material and if we switch to just using
> context.forward instead of returns, then we just need one and people can
> call forward as much as they want. It certainly warrants further
> discussion, though...
>
> To the second point, yes, I'm thinking that we can eschew the
> ValueTransformer and instead do something like ignore the forwarded key or
> check the key for serial identity, etc.
>
> The ultimate advantage of these ideas is that we reduce the number of
> interface variants and we also give people just one way to pass values
> forward instead of two.
>
> Of course, it's beyond the scope of this KIP, but this KIP is a
> precondition for these further improvements.
>
> I'm copying your comment onto the ticket for posterity.
>
> Thanks!
> -John
>
> On Tue, Jul 23, 2019 at 5:38 PM Guozhang Wang <wangguoz@gmail.com> wrote:
>
> > Hi John,
> >
> > Just a wild thought about Transformer: now with the new Processor<KIn,
> > KOut, VIn, VOut>#init(ProcessorContext<KOut, VOut>), do we still need a
> > Transformer (and even ValueTransformer / ValueTransformerWithKey)?
> >
> > What if:
> >
> > * We just make KStream#transform to get a ProcessorSupplier as well, and
> > inside `process()` we check that at most one `context.forward()` is
> called,
> > and then take it as the return value.
> > * We would still use ValueTransformer for KStream#transformValue, or we
> can
> > also use a `ProcessorSupplier where we allow at most one
> > `context.forward()` AND we ignore whatever passed in as key but just use
> > the original key.
> >
> >
> > Guozhang
> >
> >
> > On Tue, Jul 16, 2019 at 9:03 AM John Roesler <john@confluent.io> wrote:
> >
> > > Hi again, all,
> > >
> > > I have started the voting thread. Please cast your votes (or voice
> > > your objections)! The vote will remain open at least 72 hours. Once it
> > > closes, I can send the PR pretty quickly.
> > >
> > > Thanks for all you help ironing out the details on this feature.
> > > -John
> > >
> > > On Mon, Jul 15, 2019 at 5:09 PM John Roesler <john@confluent.io>
> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > It sounds like there's general agreement now on this KIP, so I
> updated
> > > > the KIP to fit in with Guozhang's overall proposed package structure.
> > > > Specifically, the proposed name for the new Processor interface is
> > > > "org.apache.kafka.streams.processor.api.Processor".
> > > >
> > > > If there are no objections, then I plan to start the vote tomorrow!
> > > >
> > > > Thanks, all, for your contributions.
> > > > -John
> > > >
> > > > On Thu, Jul 11, 2019 at 1:50 PM Matthias J. Sax <
> matthias@confluent.io
> > >
> > > wrote:
> > > > >
> > > > > Side remark:
> > > > >
> > > > > > Now that "flat transform" is a specific
> > > > > >> part of the API it seems okay to steer folks in that direction
> (to
> > > never
> > > > > >> use context.process in a transformer), but it should be called
> out
> > > > > >> explicitly in javadocs.  Currently Transformer (which is used
> for
> > > both
> > > > > >> transform() and flatTransform() ) doesn't really call out the
> > > ambiguity:
> > > > >
> > > > > Would you want to do a PR for address this? We are always eager to
> > > > > improve the JavaDocs!
> > > > >
> > > > >
> > > > > -Matthias
> > > > >
> > > > > On 7/7/19 11:26 AM, Paul Whalen wrote:
> > > > > > First of all, +1 on the whole idea, my team has run into
> > (admittedly
> > > minor,
> > > > > > but definitely annoying) issues because of the weaker typing.
> > We're
> > > heavy
> > > > > > users of the PAPI and have Processors that, while not hundreds of
> > > lines
> > > > > > long, are certainly quite hefty and call context.forward() in
> many
> > > places.
> > > > > >
> > > > > > After reading the KIP and discussion a few times, I've convinced
> > > myself
> > > > > > that any initial concerns I had aren't really concerns at all
> > (state
> > > store
> > > > > > types, for one).  One thing I will mention:  changing
> *Transformer*
> > > to have
> > > > > > ProcessorContext<Void, Void> gave me pause, because I have code
> > that
> > > does
> > > > > > context.forward in transformers.  Now that "flat transform" is a
> > > specific
> > > > > > part of the API it seems okay to steer folks in that direction
> (to
> > > never
> > > > > > use context.process in a transformer), but it should be called
> out
> > > > > > explicitly in javadocs.  Currently Transformer (which is used for
> > > both
> > > > > > transform() and flatTransform() ) doesn't really call out the
> > > ambiguity:
> > > > > >
> > >
> >
> https://github.com/apache/kafka/blob/ca641b3e2e48c14ff308181c775775408f5f35f7/streams/src/main/java/org/apache/kafka/streams/kstream/Transformer.java#L75-L77
> > > ,
> > > > > > and for migrating users (from before flatTransform) it could be
> > > confusing.
> > > > > >
> > > > > > Side note, I'd like to plug KIP-401 (there is a discussion thread
> > > and a
> > > > > > voting thread) which also relates to using the PAPI.  It seems
> like
> > > there
> > > > > > is some interest and it is in a votable state with the majority
> of
> > > > > > implementation complete.
> > > > > >
> > > > > > Paul
> > > > > >
> > > > > > On Fri, Jun 28, 2019 at 2:02 PM Bill Bejeck <bbejeck@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Sorry for coming late to the party.
> > > > > >>
> > > > > >> As for the naming I'm in favor of RecordProcessor as well.
> > > > > >>
> > > > > >> I agree that we should not take on doing all of the package
> > > movements as
> > > > > >> part of this KIP, especially as John has pointed out, it will be
> > an
> > > > > >> opportunity to discuss some clean-up on individual classes
> which I
> > > envision
> > > > > >> becoming another somewhat involved process.
> > > > > >>
> > > > > >> For the end goal, if possible, here's what I propose.
> > > > > >>
> > > > > >>    1. We keep the scope of the KIP the same, *but we only
> > > implement* *it in
> > > > > >>    phases*
> > > > > >>    2. Phase one could include what Guozhang had proposed earlier
> > > namely
> > > > > >>    1. > 1.a) modifying ProcessorContext only with the output
> types
> > > on
> > > > > >>       forward.
> > > > > >>       > 1.b) modifying Transformer signature to have generics of
> > > > > >>       ProcessorContext,
> > > > > >>       > and then lift the restricting of not using punctuate: if
> > > user did
> > > > > >>       not
> > > > > >>       > follow the enforced typing and just code without
> generics,
> > > they
> > > > > >>       will get
> > > > > >>       > warning at compile time and get run-time error if they
> > > forward
> > > > > >>       wrong-typed
> > > > > >>       > records, which I think would be acceptable.
> > > > > >>    3. Then we could tackle other pieces in an incremental manner
> > as
> > > we see
> > > > > >>    what makes sense
> > > > > >>
> > > > > >> Just my 2cents
> > > > > >>
> > > > > >> -Bill
> > > > > >>
> > > > > >> On Mon, Jun 24, 2019 at 10:22 PM Guozhang Wang <
> > wangguoz@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> Hi John,
> > > > > >>>
> > > > > >>> Yeah I think we should not do all the repackaging as part of
> this
> > > KIP as
> > > > > >>> well (we can just do the movement of the Processor /
> > > ProcessorSupplier),
> > > > > >>> but I think we need to discuss the end goal here since
> otherwise
> > > we may
> > > > > >> do
> > > > > >>> the repackaging of Processor in this KIP, but only later on
> > > realizing
> > > > > >> that
> > > > > >>> other re-packagings are not our favorite solutions.
> > > > > >>>
> > > > > >>>
> > > > > >>> Guozhang
> > > > > >>>
> > > > > >>> On Mon, Jun 24, 2019 at 3:06 PM John Roesler <
> john@confluent.io>
> > > wrote:
> > > > > >>>
> > > > > >>>> Hey Guozhang,
> > > > > >>>>
> > > > > >>>> Thanks for the idea! I'm wondering if we could take a middle
> > > ground
> > > > > >>>> and take your proposed layout as a "roadmap", while only
> > actually
> > > > > >>>> moving the classes that are already involved in this KIP.
> > > > > >>>>
> > > > > >>>> The reason I ask is not just to control the scope of this KIP,
> > but
> > > > > >>>> also, I think that if we move other classes to new packages,
> we
> > > might
> > > > > >>>> also want to take the opportunity to clean up other things
> about
> > > them.
> > > > > >>>> But each one of those would become a discussion point of its
> > own,
> > > so
> > > > > >>>> it seems the discussion would become intractable. FWIW, I do
> > like
> > > your
> > > > > >>>> idea for precisely this reason, it creates opportunities for
> us
> > to
> > > > > >>>> consider other changes that we are simply not able to make
> > without
> > > > > >>>> breaking source compatibility.
> > > > > >>>>
> > > > > >>>> If the others feel "kind of favorable" with this overall
> vision,
> > > maybe
> > > > > >>>> we can make one or more Jira tickets to capture it, and then
> > just
> > > > > >>>> alter _this_ proposal to `processor.api.Processor` (etc).
> > > > > >>>>
> > > > > >>>> WDYT?
> > > > > >>>> -John
> > > > > >>>>
> > > > > >>>> On Sun, Jun 23, 2019 at 7:17 PM Guozhang Wang <
> > wangguoz@gmail.com
> > > >
> > > > > >>> wrote:
> > > > > >>>>>
> > > > > >>>>> Hello John,
> > > > > >>>>>
> > > > > >>>>> Thanks for your detailed explanation, I've done some quick
> > > checks on
> > > > > >>> some
> > > > > >>>>> existing examples that heavily used Processor and the results
> > > also
> > > > > >>> makes
> > > > > >>>> me
> > > > > >>>>> worried about my previous statements that "the breakage would
> > > not be
> > > > > >>>> big".
> > > > > >>>>> I agree we should maintain compatibility.
> > > > > >>>>>
> > > > > >>>>> About the naming itself, I'm actually a bit inclined into
> > > > > >> sub-packages
> > > > > >>>> than
> > > > > >>>>> renamed new classes, and my motivations are that our current
> > > > > >> packaging
> > > > > >>> is
> > > > > >>>>> already quite coarsen grained and sometimes ill-placed, and
> > hence
> > > > > >> maybe
> > > > > >>>> we
> > > > > >>>>> can take this change along with some clean up on packages
> (but
> > > again,
> > > > > >>> we
> > > > > >>>>> should follow the deprecate - removal path). What I'm
> thinking
> > > is:
> > > > > >>>>>
> > > > > >>>>> -------------------
> > > > > >>>>>
> > > > > >>>>> processor/:
> > > StateRestoreCallback/AbstractNotifyingRestoreCallback,
> > > > > >>>> (deprecated
> > > > > >>>>> later, same meaning for other cross-throughs),
> ProcessContest,
> > > > > >>>>> RecordContext, Punctuator, PunctuationType, To, Cancellable
> > (are
> > > the
> > > > > >>> only
> > > > > >>>>> things left)
> > > > > >>>>>
> > > > > >>>>> (new) processor/api/: Processor, ProcessorSupplier (and of
> > > course,
> > > > > >>> these
> > > > > >>>>> two classes can be strong typed)
> > > > > >>>>>
> > > > > >>>>> state/: StateStore, BatchingStateRestoreCallback,
> > > > > >>>>> AbstractNotifyingBatchingRestoreCallback (moved from
> > processor/),
> > > > > >>>>> PartitionGrouper, WindowStoreIterator, StateSerdes (this one
> > can
> > > be
> > > > > >>> moved
> > > > > >>>>> into state/internals), TimestampedByteStore (we can move this
> > to
> > > > > >>>> internals
> > > > > >>>>> since store types would use vat by default, see below),
> > > > > >>> ValueAndTimestamp
> > > > > >>>>>
> > > > > >>>>> (new) state/factory/: Stores, StoreBuilder, StoreSupplier;
> > *BUT*
> > > the
> > > > > >>> new
> > > > > >>>>> Stores would not have timestampedXXBuilder APIs since the
> > default
> > > > > >>>>> StoreSupplier / StoreBuilder value types are
> ValueAndTimestamp
> > > > > >> already.
> > > > > >>>>>
> > > > > >>>>> (new) state/queryable/: QueryableStoreType,
> > QueryableStoreTypes,
> > > > > >>> HostInfo
> > > > > >>>>>
> > > > > >>>>> (new) state/keyValue/: KeyValueXXX classes, and also the same
> > for
> > > > > >>>>> state/sessionWindow and state/timeWindow; *BUT* here we use
> > > > > >>>>> ValueAndTimestamp as value types of those APIs directly, and
> > also
> > > > > >>>>> TimestampedKeyValue/WindowStore would be deprecated.
> > > > > >>>>>
> > > > > >>>>> (new) kstream/api/: KStream, KTable, GroupedKStream (renamed
> > from
> > > > > >>>>> KGroupedStream), GroupedKTable (renamed from KGroupedTable),
> > > > > >>>>> TimeWindowedKStream, SessionWindowedKStream, GlobalKTable
> > > > > >>>>>
> > > > > >>>>> (new) kstream/operator/: Aggregator, ForeachFunction,  ... ,
> > > Merger
> > > > > >> and
> > > > > >>>>> Grouped, Joined, Materialized, ... , Printed and Transformer,
> > > > > >>>>> TransformerSupplier.
> > > > > >>>>>
> > > > > >>>>> (new) kstream/window/: Window, Windows, Windowed,
> TimeWindows,
> > > > > >>>>> SessionWindows, UnlimitedWindows, JoinWindows,
> WindowedSerdes,
> > > > > >>>>> Time/SessionWindowedSerialized/Deserializer.
> > > > > >>>>>
> > > > > >>>>> (new) configure/: RocksDBConfigSetter, TopicNameExtractor,
> > > > > >>>>> TimestampExtractor, UsePreviousTimeOnInvalidTimestamp,
> > > > > >>>>> WallclockTimestampExtractor, ExtractRecordMetadataTimestamp,
> > > > > >>>>> FailOnInvalidTimestamp, LogAndSkipOnInvalidTimestamp,
> > > > > >>>> StateRestoreListener,
> > > > > >>>>>
> > > > > >>>>> (new) metadata/: StreamsMetadata, ThreadMetadata,
> TaskMetadata,
> > > > > >> TaskId
> > > > > >>>>>
> > > > > >>>>> Still, any xxx/internals packages are declared as inner
> > classes,
> > > but
> > > > > >>>> other
> > > > > >>>>> xxx/yyy packages are declared as public APIs.
> > > > > >>>>>
> > > > > >>>>> -------------------
> > > > > >>>>>
> > > > > >>>>> This is a very wild thought and I can totally understand if
> > > people
> > > > > >> feel
> > > > > >>>>> this is too much since it definitely enlarges the scope of
> this
> > > KIP a
> > > > > >>> lot
> > > > > >>>>> :) just trying to play a devil's advocate here to do major
> > > > > >> refactoring
> > > > > >>>> and
> > > > > >>>>> avoid renaming Processor classes.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> Guozhang
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Fri, Jun 21, 2019 at 9:51 PM Matthias J. Sax <
> > > > > >> matthias@confluent.io
> > > > > >>>>
> > > > > >>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>> I think `RecordProcessor` is a good name.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> -Matthias
> > > > > >>>>>>
> > > > > >>>>>> On 6/21/19 5:09 PM, John Roesler wrote:
> > > > > >>>>>>> After kicking the naming around a bit more, it seems like
> any
> > > > > >>> package
> > > > > >>>>>>> name change is a bit "weird" because it fragments the
> package
> > > and
> > > > > >>>>>>> directory structure. If we can come up with a reasonable
> name
> > > for
> > > > > >>> the
> > > > > >>>>>>> interface after all, it seems like the better choice.
> > > > > >>>>>>>
> > > > > >>>>>>> The real challenge is that the existing name "Processor"
> > seems
> > > > > >> just
> > > > > >>>>>>> about perfect. In picking a new name, we need to consider
> the
> > > > > >>>> ultimate
> > > > > >>>>>>> state, after the deprecation period, when we entirely
> remove
> > > > > >>>>>>> Processor. In this context, TypedProcessor seems a little
> odd
> > > to
> > > > > >>> me,
> > > > > >>>>>>> because it seems to imply that there should also be an
> > "untyped
> > > > > >>>>>>> processor".
> > > > > >>>>>>>
> > > > > >>>>>>> After kicking around a few other ideas, what does everyone
> > > think
> > > > > >>>> about
> > > > > >>>>>>> "RecordProcessor"? I _think_ maybe it stands on its own
> just
> > > > > >> fine,
> > > > > >>>>>>> because it's a thing that processes... records?
> > > > > >>>>>>>
> > > > > >>>>>>> If others agree with this, I can change the proposal to
> > > > > >>>> RecordProcessor.
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>> -John
> > > > > >>>>>>>
> > > > > >>>>>>> On Fri, Jun 21, 2019 at 6:42 PM John Roesler <
> > > john@confluent.io>
> > > > > >>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi all,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I've updated the KIP with the feedback so far.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The naming question is still the biggest (only?)
> outstanding
> > > > > >>> issue.
> > > > > >>>> It
> > > > > >>>>>>>> would be good to hear some more thoughts on it.
> > > > > >>>>>>>>
> > > > > >>>>>>>> As we stand now, there's one vote for changing the package
> > > name
> > > > > >> to
> > > > > >>>>>>>> something like 'typedprocessor', one for changing the
> > > interface
> > > > > >> to
> > > > > >>>>>>>> TypedProcessor (as in the PoC), and one for just changing
> > the
> > > > > >>>>>>>> Processor interface in-place, breaking source
> compatibility.
> > > > > >>>>>>>>
> > > > > >>>>>>>> How can we resolve this decision?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>>> -John
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Thu, Jun 20, 2019 at 5:44 PM John Roesler <
> > > john@confluent.io
> > > > > >>>
> > > > > >>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks for the feedback, Guozhang and Matthias,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding motivation: I'll update the wiki. Briefly:
> > > > > >>>>>>>>> * Any processor can benefit. Imagine a pure user of the
> > > > > >>>> ProcessorAPI
> > > > > >>>>>>>>> who has very complex processing logic. I have seen
> several
> > > > > >>>> processor
> > > > > >>>>>>>>> implementation that are hundreds of lines long and call
> > > > > >>>>>>>>> `context.forward` in many different locations and
> branches.
> > > In
> > > > > >>>> such an
> > > > > >>>>>>>>> implementation, it would be very easy to have a bug in a
> > > rarely
> > > > > >>>> used
> > > > > >>>>>>>>> branch that forwards the wrong kind of value. This would
> > > > > >>>> structurally
> > > > > >>>>>>>>> prevent that from happening.
> > > > > >>>>>>>>> * Also, anyone who heavily uses the ProcessorAPI would
> > likely
> > > > > >>> have
> > > > > >>>>>>>>> developed helper methods to wire together processors,
> just
> > as
> > > > > >> we
> > > > > >>>> have
> > > > > >>>>>>>>> in the DSL implementation. This change would enable them
> to
> > > > > >>> ensure
> > > > > >>>> at
> > > > > >>>>>>>>> compile time that they are actually wiring together
> > > compatible
> > > > > >>>> types.
> > > > > >>>>>>>>> This was actually _my_ original motivation, since I found
> > it
> > > > > >> very
> > > > > >>>>>>>>> difficult and time consuming to follow the Streams DSL
> > > internal
> > > > > >>>>>>>>> builders.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding breaking the source compatibility of
> Processor: I
> > > > > >> would
> > > > > >>>>>>>>> _love_ to side-step the naming problem, but I really
> don't
> > > know
> > > > > >>> if
> > > > > >>>>>>>>> it's excusable to break compatibility. I suspect that our
> > > > > >> oldest
> > > > > >>>> and
> > > > > >>>>>>>>> dearest friends are using the ProcessorAPI in some form
> or
> > > > > >>> another,
> > > > > >>>>>>>>> and all their source code would break. It sucks to have
> to
> > > > > >>> create a
> > > > > >>>>>>>>> whole new interface to get around this, but it feels like
> > the
> > > > > >>> right
> > > > > >>>>>>>>> thing to do. Would be nice to get even more feedback on
> > this
> > > > > >>> point,
> > > > > >>>>>>>>> though.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding the types of stores, as I said in my response
> to
> > > > > >>> Sophie,
> > > > > >>>>>>>>> it's not an issue.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding the change to StreamsBuilder, it doesn't pin
> the
> > > > > >> types
> > > > > >>> in
> > > > > >>>>>>>>> any way, since all the types are bounded by Object only,
> > and
> > > > > >>> there
> > > > > >>>> are
> > > > > >>>>>>>>> no extra constraints between arguments (each type is used
> > > only
> > > > > >>>> once in
> > > > > >>>>>>>>> one argument). But maybe I missed the point you were
> asking
> > > > > >>> about.
> > > > > >>>>>>>>> Since the type takes generic paramters, we should allow
> > users
> > > > > >> to
> > > > > >>>> pass
> > > > > >>>>>>>>> in parameterized arguments. Otherwise, they would _have
> to_
> > > > > >> give
> > > > > >>>> us a
> > > > > >>>>>>>>> raw type, and they would be forced to get a "rawtyes"
> > warning
> > > > > >>> from
> > > > > >>>> the
> > > > > >>>>>>>>> compiler. So, it's our obligation in any API that
> accepts a
> > > > > >>>>>>>>> parameterized-type parameter to allow people to actually
> > > pass a
> > > > > >>>>>>>>> parameterized type, even if we don't actually use the
> > > > > >> parameters.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> The naming question is a complex one, as I took pains to
> > > detail
> > > > > >>>>>>>>> previously. Please don't just pick out one minor point,
> > call
> > > it
> > > > > >>>> weak,
> > > > > >>>>>>>>> and then claim that it invalidates the whole decision. I
> > > don't
> > > > > >>>> think
> > > > > >>>>>>>>> there's a clear best choice, so I'm more than happy for
> > > someone
> > > > > >>> to
> > > > > >>>>>>>>> advocate for renaming the class instead of the package.
> Can
> > > you
> > > > > >>>>>>>>> provide some reasons why you think that would be better?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding the deprecated methods, you're absolutely
> right.
> > > I'll
> > > > > >>>>> update the KIP.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks again for all the feedback!
> > > > > >>>>>>>>> -John
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax <
> > > > > >>>> matthias@confluent.io>
> > > > > >>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Just want to second what Sophie said about the stores.
> The
> > > > > >> type
> > > > > >>>> of a
> > > > > >>>>>>>>>> used stores is completely independent of input/output
> > types.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> This related to change `addGlobalStore()` method. Why do
> > you
> > > > > >>> want
> > > > > >>>> to
> > > > > >>>>> pin
> > > > > >>>>>>>>>> the types? In fact, people request the ability to
> filter()
> > > and
> > > > > >>>> maybe
> > > > > >>>>>>>>>> even map() the data before they are put into the global
> > > store.
> > > > > >>>>> Limiting
> > > > > >>>>>>>>>> the types seems to be a step backward here?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Also, the pack name is questionable.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> This wouldn't be the first project to do something like
> > > > > >> this...
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Not a strong argument. I would actually propose to not
> a a
> > > new
> > > > > >>>>> package,
> > > > > >>>>>>>>>> but just a new class `TypedProcessor`.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For `ProcessorContext#forward` methods -- some of those
> > > > > >> methods
> > > > > >>>> are
> > > > > >>>>>>>>>> already deprecated. While the will still be affected, it
> > > would
> > > > > >>> be
> > > > > >>>>> worth
> > > > > >>>>>>>>>> to mark them as deprecated in the wiki page, too.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> @Guozhang: I dont' think we should break source
> > > compatibility
> > > > > >>> in a
> > > > > >>>>> minor
> > > > > >>>>>>>>>> release.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> -Matthias
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On 6/20/19 1:43 PM, Guozhang Wang wrote:
> > > > > >>>>>>>>>>> Hi John,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks for KIP! I've a few comments below:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 1. So far the "Motivation" section is very general, and
> > the
> > > > > >>> only
> > > > > >>>>> concrete
> > > > > >>>>>>>>>>> example that I have in mind is
> > `TransformValues#punctuate`.
> > > > > >> Do
> > > > > >>> we
> > > > > >>>>> have any
> > > > > >>>>>>>>>>> other concrete issues that drive this KIP? If not then
> I
> > > feel
> > > > > >>>>> better to
> > > > > >>>>>>>>>>> narrow the scope of this KIP to:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 1.a) modifying ProcessorContext only with the output
> > types
> > > on
> > > > > >>>>> forward.
> > > > > >>>>>>>>>>> 1.b) modifying Transformer signature to have generics
> of
> > > > > >>>>> ProcessorContext,
> > > > > >>>>>>>>>>> and then lift the restricting of not using punctuate:
> if
> > > user
> > > > > >>> did
> > > > > >>>>> not
> > > > > >>>>>>>>>>> follow the enforced typing and just code without
> > generics,
> > > > > >> they
> > > > > >>>>> will get
> > > > > >>>>>>>>>>> warning at compile time and get run-time error if they
> > > > > >> forward
> > > > > >>>>> wrong-typed
> > > > > >>>>>>>>>>> records, which I think would be acceptable.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I feel this would be a good solution for this specific
> > > issue;
> > > > > >>>>> again, feel
> > > > > >>>>>>>>>>> free to update the wiki page with other known issues
> that
> > > > > >>> cannot
> > > > > >>>> be
> > > > > >>>>>>>>>>> resolved.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 2. If, we want to go with the current scope then my
> next
> > > > > >>> question
> > > > > >>>>> would be,
> > > > > >>>>>>>>>>> how much breakage we would introducing if we just
> modify
> > > the
> > > > > >>>>> Processor
> > > > > >>>>>>>>>>> signature directly? My feeling is that DSL users would
> be
> > > > > >> most
> > > > > >>>>> likely not
> > > > > >>>>>>>>>>> affected and PAPI users only need to modify a few lines
> > on
> > > > > >>> class
> > > > > >>>>>>>>>>> declaration. I feel it worth doing some research on
> this
> > > part
> > > > > >>> and
> > > > > >>>>> then
> > > > > >>>>>>>>>>> decide if we really want to bite the bullet of
> duplicated
> > > > > >>>> Processor
> > > > > >>>>> /
> > > > > >>>>>>>>>>> ProcessorSupplier classes for maintaining
> compatibility.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Guozhang
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Wed, Jun 19, 2019 at 12:21 PM John Roesler <
> > > > > >>> john@confluent.io
> > > > > >>>>>
> > > > > >>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> Hi all,
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> In response to the feedback so far, I changed the
> > package
> > > > > >> name
> > > > > >>>> from
> > > > > >>>>>>>>>>>> `processor2` to `processor.generic`.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>> -John
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler <
> > > > > >>> john@confluent.io
> > > > > >>>>>
> > > > > >>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Thanks for the feedback, Sophie!
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> I actually felt a little uneasy when I wrote that
> > remark,
> > > > > >>>> because
> > > > > >>>>> it's
> > > > > >>>>>>>>>>>>> not restricted at all in the API, it's just available
> > to
> > > > > >> you
> > > > > >>> if
> > > > > >>>>> you
> > > > > >>>>>>>>>>>>> choose to give your stores and context the same
> > > parameters.
> > > > > >>>> So, I
> > > > > >>>>>>>>>>>>> think your use case is valid, and also perfectly
> > > > > >> permissable
> > > > > >>>>> under the
> > > > > >>>>>>>>>>>>> current KIP. Sorry for sowing confusion on my own
> > > > > >> discussion
> > > > > >>>>> thread!
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> I'm not crazy about the package name, either. I went
> > with
> > > > > >> it
> > > > > >>>> only
> > > > > >>>>>>>>>>>>> because there's seemingly nothing special about the
> new
> > > > > >>> package
> > > > > >>>>> except
> > > > > >>>>>>>>>>>>> that it can't have the same name as the old one.
> > > Otherwise,
> > > > > >>> the
> > > > > >>>>>>>>>>>>> existing "processor" and "Processor" names for the
> > > package
> > > > > >>> and
> > > > > >>>>> class
> > > > > >>>>>>>>>>>>> are perfectly satisfying. Rather than pile on
> > additional
> > > > > >>>>> semantics, it
> > > > > >>>>>>>>>>>>> seemed cleaner to just add a number to the package
> > name.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> This wouldn't be the first project to do something
> like
> > > > > >>> this...
> > > > > >>>>> Apache
> > > > > >>>>>>>>>>>>> Commons, for example, has added a "2" to the end of
> > some
> > > of
> > > > > >>>> their
> > > > > >>>>>>>>>>>>> packages for exactly the same reason.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> I'm open to any suggestions. For example, we could do
> > > > > >>> something
> > > > > >>>>> like
> > > > > >>>>>>>>>>>>> org.apache.kafka.streams.typedprocessor.Processor or
> > > > > >>>>>>>>>>>>> org.apache.kafka.streams.processor.typed.Processor ,
> > > which
> > > > > >>>> would
> > > > > >>>>> have
> > > > > >>>>>>>>>>>>> just about the same effect. One microscopic thought
> is
> > > > > >> that,
> > > > > >>> if
> > > > > >>>>>>>>>>>>> there's another interface in the "processor" package
> > that
> > > > > >> we
> > > > > >>>> wish
> > > > > >>>>> to
> > > > > >>>>>>>>>>>>> do the same thing to, would _could_ pile it in to
> > > > > >>> "processor2",
> > > > > >>>>> but we
> > > > > >>>>>>>>>>>>> couldn't do the same if we use a package that has
> > "typed"
> > > > > >> in
> > > > > >>>> the
> > > > > >>>>> name,
> > > > > >>>>>>>>>>>>> unless that change is _also_ related to types in some
> > > way.
> > > > > >>> But
> > > > > >>>>> this
> > > > > >>>>>>>>>>>>> seems like a very minor concern.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> What's your preference?
> > > > > >>>>>>>>>>>>> -John
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman <
> > > > > >>>>> sophie@confluent.io>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Hey John, thanks for writing this up! I like the
> > > proposal
> > > > > >>> but
> > > > > >>>>> there's
> > > > > >>>>>>>>>>>> one
> > > > > >>>>>>>>>>>>>> point that I think may be too restrictive:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> "A processor that happens to use a typed store is
> > > actually
> > > > > >>>>> emitting the
> > > > > >>>>>>>>>>>>>> same types that it is storing."
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I can imagine someone could want to leverage this
> new
> > > type
> > > > > >>>> safety
> > > > > >>>>>>>>>>>> without
> > > > > >>>>>>>>>>>>>> also limiting how they can interact with/use their
> > > store.
> > > > > >> As
> > > > > >>>> an
> > > > > >>>>>>>>>>>> (admittedly
> > > > > >>>>>>>>>>>>>> contrived) example, say you have an input stream of
> > > > > >>> purchases
> > > > > >>>> of
> > > > > >>>>> a
> > > > > >>>>>>>>>>>> certain
> > > > > >>>>>>>>>>>>>> type (entertainment, food, etc), and on seeing a new
> > > > > >> record
> > > > > >>>> you
> > > > > >>>>> want to
> > > > > >>>>>>>>>>>>>> output how many types of purchase a shopper has made
> > > more
> > > > > >>>> than 5
> > > > > >>>>>>>>>>>> purchases
> > > > > >>>>>>>>>>>>>> of in the last month. Your state store will probably
> > be
> > > > > >>>> holding
> > > > > >>>>> some
> > > > > >>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>> complicated PurchaseHistory object (keyed by user),
> > but
> > > > > >> your
> > > > > >>>>> output is
> > > > > >>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>> a <User, Long>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I'm also not crazy about "processor2" as the package
> > > name
> > > > > >>> ...
> > > > > >>>>> not sure
> > > > > >>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>> a better one would be though (something with
> "typed"?)
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler <
> > > > > >>>> john@confluent.io>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Hi all,
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> I'd like to propose KIP-478 (
> > > > > >>>>>>>>>>>> https://cwiki.apache.org/confluence/x/2SkLBw
> > > > > >>>>>>>>>>>>>>> ).
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> This proposal would add output type bounds to the
> > > > > >> Processor
> > > > > >>>>> interface
> > > > > >>>>>>>>>>>>>>> in Kafka Streams, which enables static checking of
> a
> > > > > >> number
> > > > > >>>> of
> > > > > >>>>> useful
> > > > > >>>>>>>>>>>>>>> properties:
> > > > > >>>>>>>>>>>>>>> * A processor B that consumes the output of
> > processor A
> > > > > >> is
> > > > > >>>>> actually
> > > > > >>>>>>>>>>>>>>> expecting the same types that processor A produces.
> > > > > >>>>>>>>>>>>>>> * A processor that happens to use a typed store is
> > > > > >> actually
> > > > > >>>>> emitting
> > > > > >>>>>>>>>>>>>>> the same types that it is storing.
> > > > > >>>>>>>>>>>>>>> * A processor is simply forwarding the expected
> types
> > > in
> > > > > >>> all
> > > > > >>>>> code
> > > > > >>>>>>>>>>>> paths.
> > > > > >>>>>>>>>>>>>>> * Processors added via the Streams DSL, which are
> not
> > > > > >>>> permitted
> > > > > >>>>> to
> > > > > >>>>>>>>>>>>>>> forward results at all are statically prevented
> from
> > > > > >> doing
> > > > > >>> so
> > > > > >>>>> by the
> > > > > >>>>>>>>>>>>>>> compiler
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Internally, we can use the above properties to
> > achieve
> > > a
> > > > > >>> much
> > > > > >>>>> higher
> > > > > >>>>>>>>>>>>>>> level of confidence in the Streams DSL
> > implementation's
> > > > > >>>>> correctness.
> > > > > >>>>>>>>>>>>>>> Actually, while doing the POC, I found a few bugs
> and
> > > > > >>>> mistakes,
> > > > > >>>>> which
> > > > > >>>>>>>>>>>>>>> become structurally impossible with KIP-478.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Additionally, the stronger types dramatically
> improve
> > > the
> > > > > >>>>>>>>>>>>>>> self-documentation of our Streams internal
> > > > > >> implementations,
> > > > > >>>>> which
> > > > > >>>>>>>>>>>>>>> makes it much easier for new contributors to ramp
> up
> > > with
> > > > > >>>>> confidence.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thanks so much for your consideration!
> > > > > >>>>>>>>>>>>>>> -John
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> -- Guozhang
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> -- Guozhang
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message