kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: [DISCUSS] KIP-478 Strongly Typed Processor API
Date Tue, 30 Jul 2019 19:47:20 GMT
Thanks, Matthias, and thanks again for raising the concern.
-John

On Mon, Jul 29, 2019 at 4:58 PM Matthias J. Sax <matthias@confluent.io>
wrote:

> Thanks for the details!
>
> Also talked to Guozhang about a potential upgrade path. This KIP seems
> not to put us into an bad position to provide a clean upgrade path if we
> change the `ProcessorContext` in the future.
>
> Thus, I think we can move forward.
>
>
> -Matthias
>
> On 7/24/19 3:32 PM, John Roesler wrote:
> > Hey again Matthias,
> >
> > I think it might help to frame the evaluation of the Context question if
> we
> > have a "spitball" proposal for what change we might want to make to the
> > context.
> >
> > Currently, the ProcessorContext is referenced in the following public
> > interfaces:
> >
> > org.apache.kafka.streams.errors.DeserializationExceptionHandler#handle
> > org.apache.kafka.streams.kstream.Transformer#init
> > org.apache.kafka.streams.kstream.ValueTransformer#init
> > org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> > org.apache.kafka.streams.processor.Processor#init
> > org.apache.kafka.streams.processor.StateStore#init
> >
> > We can sub-divide the ProcessorContext into broad categories:
> > General Information:
> > * a handle on the config
> > * information about the execution context (what is the task id, the
> > application id, etc)
> > Record Information:
> > * extra information about the current record
> > Store Support:
> > * the ability to register state restore callbacks
> > Processor Support:
> > * the ability to schedule punctuations
> > * the ability to get registered state stores
> > * the ability to schedule punctuations
> > * the ability to forward records
> > * the ability to request commits
> >
> > We could imagine slicing the Processor Context into four new component
> > interfaces, and making ProcessorContext just implement them. Then, we
> could
> > mix-and-match those new component interfaces for use elsewhere.
> >
> > E.g.,:
> > org.apache.kafka.streams.errors.DeserializationExceptionHandler#handle
> > * only gets the informational context
> >
> > org.apache.kafka.streams.kstream.Transformer#init
> > org.apache.kafka.streams.kstream.ValueTransformer#init
> > org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> > * information context
> > * the ability to get registered state stores
> > Also
> > * the ability to schedule punctuations
> > * restricted ability to forward (only obeying the rules of the particular
> > interface, for example)
> > Or maybe just:
> > * no ability to foraward
> > * the ability to schedule special punctuators that can return one or more
> > keys or values when fired
> >
> > org.apache.kafka.streams.processor.Processor#init
> > * all the contexts, except the ability to register state restore
> callbacks
> >
> > org.apache.kafka.streams.processor.StateStore#init
> > * information contexts
> > * the ability to register state restore callbacks
> > * maybe punctuations and forwards, could be discussed further
> >
> >
> > The operative question for us right now is whether there is a clean path
> to
> > something like this from the current KIP, or whether we'd be forced to
> > deprecate an interface we are only just now adding. Note that the only
> > interfaces we're adding right now are :
> > * org.apache.kafka.streams.processor.api.Processor
> > * org.apache.kafka.streams.processor.api.ProcessorSupplier
> > And the only thing we need to make the above spitball proposal compatible
> > with these proposed interfaces is to deprecate the ability to register
> > state restore callbacks from the ProcessorContext.
> >
> > Otherwise, we would at that time be able to propose new Transformer
> > interfaces that take (e.g.) TransformerContexts, likewise with
> > DeserializationExceptionHandler and StateStore.
> >
> > In other words, I _think_ that we have a clean migration path to address
> > the Context problem in follow-on work. But clearly my motivation may be
> > corrupt. What do you think?
> >
> > Thanks,
> > -John
> >
> >
> > On Wed, Jul 24, 2019 at 5:06 PM John Roesler <john@confluent.io> wrote:
> >
> >> Hey Matthias,
> >>
> >> I agree, it's worth double-checking to make sure that the upgrade path
> >> would be smooth. There's no point in putting ourselves in an awkward
> jam.
> >> I'll look into it and report back.
> >>
> >> Regarding the global store logic, I confirmed that the "state update
> >> processor" shouldn't be forwarding anything, so we can safely bound its
> >> output type to `Void`. I've updated the KIP.
> >>
> >> Thanks,
> >> -John
> >>
> >> On Wed, Jul 24, 2019 at 3:08 PM Matthias J. Sax <matthias@confluent.io>
> >> wrote:
> >>
> >>> If we don't fix the `ProcessorContext` now, how would an upgrade path
> >>> look like?
> >>>
> >>> We woudl deprecate existing `init()` and add a new `init()`, and during
> >>> runtime need to call both? This sound rather error prone to me and
> might
> >>> be confusing to users? Hence, it might be beneficial to fix it right
> now.
> >>>
> >>> If my concerns are not valid, and we think that the upgrade path will
> >>> smooth, we can of course do a follow up KIP. Another possibility would
> >>> be, to still do an extra KIP but ensure that both KIPs are contained in
> >>> the same release.
> >>>
> >>> WDYT?
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 7/24/19 11:55 AM, John Roesler wrote:
> >>>> Hey Matthias,
> >>>>
> >>>> Thanks for the review!
> >>>>
> >>>> I agree about ProcessorContext, it could certainly be split up to
> >>> improve
> >>>> compile-time clues about what is or is not permitted (like, do you
> just
> >>>> want to be able to see the extra record context vs. forawrding vs.
> >>>> registering state stores, as you said). But, similar to the ideas
> around
> >>>> transforms, we can hopefully make that a separate design effort
> outside
> >>> of
> >>>> this KIP. Is that ok with you?
> >>>>
> >>>> Note that, unlike the current Processor API, KIP-478 proposes to
> >>> provide a
> >>>> default no-op implementation of init(), which means we can deprecate
> it
> >>>> later and replace it with one taking a cleaner "context" abstraction,
> as
> >>>> you proposed.
> >>>>
> >>>> It's just that the typing change as proposed is already a very large
> >>> design
> >>>> and implementation scope. I fear that adding in new flavors of
> >>>> ProcessorContext would make is much harder to actually consider the
> >>> design,
> >>>> and certainly stretch out the implementation phase as well.
> >>>>
> >>>> Regarding the documentation of non-goals, that's very good feedback.
> >>> I'll
> >>>> update the KIP.
> >>>>
> >>>> Regarding addGlobalStore... I'll look into it.
> >>>>
> >>>> Thanks!
> >>>> -John
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jul 24, 2019 at 12:27 PM Matthias J. Sax <
> matthias@confluent.io
> >>>>
> >>>> wrote:
> >>>>
> >>>>> I have concerns about the latest proposal from Guozhang. However, as
> >>>>> John said it's beyond the scope of this KIP and thus, I don't go into
> >>>>> details. I agree thought, that the current "transformer APIs" are not
> >>>>> ideal and could be improved.
> >>>>>
> >>>>>
> >>>>> An orthogonal though is that we should split the current
> >>>>> `ProcessorContext` into multiple interfaces. Atm, the context can be
> >>> use
> >>>>> to:
> >>>>>
> >>>>> - access metadata
> >>>>> - schedule punctuation
> >>>>> - get state stores
> >>>>> - register state stores
> >>>>> - forward output data
> >>>>>
> >>>>> (1) registering state stores is only required if one implements a
> >>> custom
> >>>>> store, but not for a regular `Processor` implementation -- hence,
> it's
> >>> a
> >>>>> leaking abstraction
> >>>>>
> >>>>> (2) for `ValueTransformer` and `flatValueTransformer` we don't want
> to
> >>>>> allow forwarding key-value pairs, and hence need to throw an RTE for
> >>>>> this case atm
> >>>>>
> >>>>> (3) Why do we expose `keySerde()`, `valueSerde()`, and `stateDir()`
> >>>>> explicitly? We have already `appConfigs()` to allow users to access
> the
> >>>>> configuration.
> >>>>>
> >>>>> Overall, it seems that `ProcessorContext` is rather convoluted.
> >>> Because,
> >>>>> we add a new `Processor` abstraction, it seems like a good
> opportunity
> >>>>> to improve the interface and to not pass `ProcessroContext` into the
> >>> new
> >>>>> `Processor#init()` method, but an improved interface.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>>
> >>>>>
> >>>>> One more nits about the KIP:
> >>>>>
> >>>>> I think, we should clearly state, that this change does not provide
> >>> type
> >>>>> safety for PAPI users. The following example would compile without
> any
> >>>>> errors or warning, even if the types don't match:
> >>>>>
> >>>>>> Topology t = new Topology();
> >>>>>> t.addSource("s", ...);
> >>>>>> t.addProcessor("p1", new ProcessorSupplier<KIn, VIn, FooKey,
> >>>>> BarValue>()..., "s");
> >>>>>> t.addProcessor("p2", new ProcessorSupplier<NotFooKey, NotBarValue,
> >>> KOut,
> >>>>> VOut>()..., "p1");
> >>>>>
> >>>>> Just want to make sure users understand the impact/scope of the
> change,
> >>>>> especially what is _not_ achieved.
> >>>>>
> >>>>>
> >>>>> About `addGlobalStore()` -- should the return types be `Void` similar
> >>> to
> >>>>> `KStream#process()`?
> >>>>>
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>> On 7/24/19 9:11 AM, Guozhang Wang wrote:
> >>>>>> Sounds good to me, thanks John!
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Wed, Jul 24, 2019 at 7:40 AM John Roesler <john@confluent.io>
> >>> wrote:
> >>>>>>
> >>>>>>> Hey Guozhang,
> >>>>>>>
> >>>>>>> Thanks for the thought! It sounds related to what I was thinking in
> >>>>>>> https://issues.apache.org/jira/browse/KAFKA-8396 , but a little
> >>>>> "extra"...
> >>>>>>>
> >>>>>>> I proposed to eliminate ValueTransformer, but I believe you're
> >>> right; we
> >>>>>>> could eliminate Transformer also and just use Processor in the
> >>>>> transform()
> >>>>>>> methods.
> >>>>>>>
> >>>>>>> To your first bullet, regarding transform/flatTransform... I'd
> argue
> >>>>> that
> >>>>>>> the difference isn't material and if we switch to just using
> >>>>>>> context.forward instead of returns, then we just need one and
> people
> >>> can
> >>>>>>> call forward as much as they want. It certainly warrants further
> >>>>>>> discussion, though...
> >>>>>>>
> >>>>>>> To the second point, yes, I'm thinking that we can eschew the
> >>>>>>> ValueTransformer and instead do something like ignore the forwarded
> >>> key
> >>>>> or
> >>>>>>> check the key for serial identity, etc.
> >>>>>>>
> >>>>>>> The ultimate advantage of these ideas is that we reduce the number
> of
> >>>>>>> interface variants and we also give people just one way to pass
> >>> values
> >>>>>>> forward instead of two.
> >>>>>>>
> >>>>>>> Of course, it's beyond the scope of this KIP, but this KIP is a
> >>>>>>> precondition for these further improvements.
> >>>>>>>
> >>>>>>> I'm copying your comment onto the ticket for posterity.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Tue, Jul 23, 2019 at 5:38 PM Guozhang Wang <wangguoz@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi John,
> >>>>>>>>
> >>>>>>>> Just a wild thought about Transformer: now with the new
> >>> Processor<KIn,
> >>>>>>>> KOut, VIn, VOut>#init(ProcessorContext<KOut, VOut>), do we still
> >>> need a
> >>>>>>>> Transformer (and even ValueTransformer / ValueTransformerWithKey)?
> >>>>>>>>
> >>>>>>>> What if:
> >>>>>>>>
> >>>>>>>> * We just make KStream#transform to get a ProcessorSupplier as
> well,
> >>>>> and
> >>>>>>>> inside `process()` we check that at most one `context.forward()`
> is
> >>>>>>> called,
> >>>>>>>> and then take it as the return value.
> >>>>>>>> * We would still use ValueTransformer for KStream#transformValue,
> >>> or we
> >>>>>>> can
> >>>>>>>> also use a `ProcessorSupplier where we allow at most one
> >>>>>>>> `context.forward()` AND we ignore whatever passed in as key but
> just
> >>>>> use
> >>>>>>>> the original key.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jul 16, 2019 at 9:03 AM John Roesler <john@confluent.io>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi again, all,
> >>>>>>>>>
> >>>>>>>>> I have started the voting thread. Please cast your votes (or
> voice
> >>>>>>>>> your objections)! The vote will remain open at least 72 hours.
> >>> Once it
> >>>>>>>>> closes, I can send the PR pretty quickly.
> >>>>>>>>>
> >>>>>>>>> Thanks for all you help ironing out the details on this feature.
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 15, 2019 at 5:09 PM John Roesler <john@confluent.io>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey all,
> >>>>>>>>>>
> >>>>>>>>>> It sounds like there's general agreement now on this KIP, so I
> >>>>>>> updated
> >>>>>>>>>> the KIP to fit in with Guozhang's overall proposed package
> >>> structure.
> >>>>>>>>>> Specifically, the proposed name for the new Processor interface
> is
> >>>>>>>>>> "org.apache.kafka.streams.processor.api.Processor".
> >>>>>>>>>>
> >>>>>>>>>> If there are no objections, then I plan to start the vote
> >>> tomorrow!
> >>>>>>>>>>
> >>>>>>>>>> Thanks, all, for your contributions.
> >>>>>>>>>> -John
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jul 11, 2019 at 1:50 PM Matthias J. Sax <
> >>>>>>> matthias@confluent.io
> >>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Side remark:
> >>>>>>>>>>>
> >>>>>>>>>>>> Now that "flat transform" is a specific
> >>>>>>>>>>>>> part of the API it seems okay to steer folks in that
> direction
> >>>>>>> (to
> >>>>>>>>> never
> >>>>>>>>>>>>> use context.process in a transformer), but it should be
> called
> >>>>>>> out
> >>>>>>>>>>>>> explicitly in javadocs.  Currently Transformer (which is used
> >>>>>>> for
> >>>>>>>>> both
> >>>>>>>>>>>>> transform() and flatTransform() ) doesn't really call out the
> >>>>>>>>> ambiguity:
> >>>>>>>>>>>
> >>>>>>>>>>> Would you want to do a PR for address this? We are always eager
> >>> to
> >>>>>>>>>>> improve the JavaDocs!
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>> On 7/7/19 11:26 AM, Paul Whalen wrote:
> >>>>>>>>>>>> First of all, +1 on the whole idea, my team has run into
> >>>>>>>> (admittedly
> >>>>>>>>> minor,
> >>>>>>>>>>>> but definitely annoying) issues because of the weaker typing.
> >>>>>>>> We're
> >>>>>>>>> heavy
> >>>>>>>>>>>> users of the PAPI and have Processors that, while not hundreds
> >>> of
> >>>>>>>>> lines
> >>>>>>>>>>>> long, are certainly quite hefty and call context.forward() in
> >>>>>>> many
> >>>>>>>>> places.
> >>>>>>>>>>>>
> >>>>>>>>>>>> After reading the KIP and discussion a few times, I've
> convinced
> >>>>>>>>> myself
> >>>>>>>>>>>> that any initial concerns I had aren't really concerns at all
> >>>>>>>> (state
> >>>>>>>>> store
> >>>>>>>>>>>> types, for one).  One thing I will mention:  changing
> >>>>>>> *Transformer*
> >>>>>>>>> to have
> >>>>>>>>>>>> ProcessorContext<Void, Void> gave me pause, because I have
> code
> >>>>>>>> that
> >>>>>>>>> does
> >>>>>>>>>>>> context.forward in transformers.  Now that "flat transform"
> is a
> >>>>>>>>> specific
> >>>>>>>>>>>> part of the API it seems okay to steer folks in that direction
> >>>>>>> (to
> >>>>>>>>> never
> >>>>>>>>>>>> use context.process in a transformer), but it should be called
> >>>>>>> out
> >>>>>>>>>>>> explicitly in javadocs.  Currently Transformer (which is used
> >>> for
> >>>>>>>>> both
> >>>>>>>>>>>> transform() and flatTransform() ) doesn't really call out the
> >>>>>>>>> ambiguity:
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> https://github.com/apache/kafka/blob/ca641b3e2e48c14ff308181c775775408f5f35f7/streams/src/main/java/org/apache/kafka/streams/kstream/Transformer.java#L75-L77
> >>>>>>>>> ,
> >>>>>>>>>>>> and for migrating users (from before flatTransform) it could
> be
> >>>>>>>>> confusing.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Side note, I'd like to plug KIP-401 (there is a discussion
> >>> thread
> >>>>>>>>> and a
> >>>>>>>>>>>> voting thread) which also relates to using the PAPI.  It seems
> >>>>>>> like
> >>>>>>>>> there
> >>>>>>>>>>>> is some interest and it is in a votable state with the
> majority
> >>>>>>> of
> >>>>>>>>>>>> implementation complete.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Paul
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jun 28, 2019 at 2:02 PM Bill Bejeck <
> bbejeck@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Sorry for coming late to the party.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> As for the naming I'm in favor of RecordProcessor as well.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I agree that we should not take on doing all of the package
> >>>>>>>>> movements as
> >>>>>>>>>>>>> part of this KIP, especially as John has pointed out, it will
> >>> be
> >>>>>>>> an
> >>>>>>>>>>>>> opportunity to discuss some clean-up on individual classes
> >>>>>>> which I
> >>>>>>>>> envision
> >>>>>>>>>>>>> becoming another somewhat involved process.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For the end goal, if possible, here's what I propose.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    1. We keep the scope of the KIP the same, *but we only
> >>>>>>>>> implement* *it in
> >>>>>>>>>>>>>    phases*
> >>>>>>>>>>>>>    2. Phase one could include what Guozhang had proposed
> >>> earlier
> >>>>>>>>> namely
> >>>>>>>>>>>>>    1. > 1.a) modifying ProcessorContext only with the output
> >>>>>>> types
> >>>>>>>>> on
> >>>>>>>>>>>>>       forward.
> >>>>>>>>>>>>>       > 1.b) modifying Transformer signature to have generics
> >>> of
> >>>>>>>>>>>>>       ProcessorContext,
> >>>>>>>>>>>>>       > and then lift the restricting of not using punctuate:
> >>> if
> >>>>>>>>> user did
> >>>>>>>>>>>>>       not
> >>>>>>>>>>>>>       > follow the enforced typing and just code without
> >>>>>>> generics,
> >>>>>>>>> they
> >>>>>>>>>>>>>       will get
> >>>>>>>>>>>>>       > warning at compile time and get run-time error if
> they
> >>>>>>>>> forward
> >>>>>>>>>>>>>       wrong-typed
> >>>>>>>>>>>>>       > records, which I think would be acceptable.
> >>>>>>>>>>>>>    3. Then we could tackle other pieces in an incremental
> >>> manner
> >>>>>>>> as
> >>>>>>>>> we see
> >>>>>>>>>>>>>    what makes sense
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Just my 2cents
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Bill
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:22 PM Guozhang Wang <
> >>>>>>>> wangguoz@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yeah I think we should not do all the repackaging as part of
> >>>>>>> this
> >>>>>>>>> KIP as
> >>>>>>>>>>>>>> well (we can just do the movement of the Processor /
> >>>>>>>>> ProcessorSupplier),
> >>>>>>>>>>>>>> but I think we need to discuss the end goal here since
> >>>>>>> otherwise
> >>>>>>>>> we may
> >>>>>>>>>>>>> do
> >>>>>>>>>>>>>> the repackaging of Processor in this KIP, but only later on
> >>>>>>>>> realizing
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>> other re-packagings are not our favorite solutions.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 3:06 PM John Roesler <
> >>>>>>> john@confluent.io>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hey Guozhang,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the idea! I'm wondering if we could take a
> middle
> >>>>>>>>> ground
> >>>>>>>>>>>>>>> and take your proposed layout as a "roadmap", while only
> >>>>>>>> actually
> >>>>>>>>>>>>>>> moving the classes that are already involved in this KIP.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The reason I ask is not just to control the scope of this
> >>> KIP,
> >>>>>>>> but
> >>>>>>>>>>>>>>> also, I think that if we move other classes to new
> packages,
> >>>>>>> we
> >>>>>>>>> might
> >>>>>>>>>>>>>>> also want to take the opportunity to clean up other things
> >>>>>>> about
> >>>>>>>>> them.
> >>>>>>>>>>>>>>> But each one of those would become a discussion point of
> its
> >>>>>>>> own,
> >>>>>>>>> so
> >>>>>>>>>>>>>>> it seems the discussion would become intractable. FWIW, I
> do
> >>>>>>>> like
> >>>>>>>>> your
> >>>>>>>>>>>>>>> idea for precisely this reason, it creates opportunities
> for
> >>>>>>> us
> >>>>>>>> to
> >>>>>>>>>>>>>>> consider other changes that we are simply not able to make
> >>>>>>>> without
> >>>>>>>>>>>>>>> breaking source compatibility.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If the others feel "kind of favorable" with this overall
> >>>>>>> vision,
> >>>>>>>>> maybe
> >>>>>>>>>>>>>>> we can make one or more Jira tickets to capture it, and
> then
> >>>>>>>> just
> >>>>>>>>>>>>>>> alter _this_ proposal to `processor.api.Processor` (etc).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Sun, Jun 23, 2019 at 7:17 PM Guozhang Wang <
> >>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hello John,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for your detailed explanation, I've done some quick
> >>>>>>>>> checks on
> >>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>> existing examples that heavily used Processor and the
> >>> results
> >>>>>>>>> also
> >>>>>>>>>>>>>> makes
> >>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>> worried about my previous statements that "the breakage
> >>> would
> >>>>>>>>> not be
> >>>>>>>>>>>>>>> big".
> >>>>>>>>>>>>>>>> I agree we should maintain compatibility.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> About the naming itself, I'm actually a bit inclined into
> >>>>>>>>>>>>> sub-packages
> >>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>> renamed new classes, and my motivations are that our
> current
> >>>>>>>>>>>>> packaging
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> already quite coarsen grained and sometimes ill-placed,
> and
> >>>>>>>> hence
> >>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> can take this change along with some clean up on packages
> >>>>>>> (but
> >>>>>>>>> again,
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> should follow the deprecate - removal path). What I'm
> >>>>>>> thinking
> >>>>>>>>> is:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -------------------
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> processor/:
> >>>>>>>>> StateRestoreCallback/AbstractNotifyingRestoreCallback,
> >>>>>>>>>>>>>>> (deprecated
> >>>>>>>>>>>>>>>> later, same meaning for other cross-throughs),
> >>>>>>> ProcessContest,
> >>>>>>>>>>>>>>>> RecordContext, Punctuator, PunctuationType, To,
> Cancellable
> >>>>>>>> (are
> >>>>>>>>> the
> >>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>> things left)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) processor/api/: Processor, ProcessorSupplier (and of
> >>>>>>>>> course,
> >>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>> two classes can be strong typed)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> state/: StateStore, BatchingStateRestoreCallback,
> >>>>>>>>>>>>>>>> AbstractNotifyingBatchingRestoreCallback (moved from
> >>>>>>>> processor/),
> >>>>>>>>>>>>>>>> PartitionGrouper, WindowStoreIterator, StateSerdes (this
> one
> >>>>>>>> can
> >>>>>>>>> be
> >>>>>>>>>>>>>> moved
> >>>>>>>>>>>>>>>> into state/internals), TimestampedByteStore (we can move
> >>> this
> >>>>>>>> to
> >>>>>>>>>>>>>>> internals
> >>>>>>>>>>>>>>>> since store types would use vat by default, see below),
> >>>>>>>>>>>>>> ValueAndTimestamp
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) state/factory/: Stores, StoreBuilder, StoreSupplier;
> >>>>>>>> *BUT*
> >>>>>>>>> the
> >>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>> Stores would not have timestampedXXBuilder APIs since the
> >>>>>>>> default
> >>>>>>>>>>>>>>>> StoreSupplier / StoreBuilder value types are
> >>>>>>> ValueAndTimestamp
> >>>>>>>>>>>>> already.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) state/queryable/: QueryableStoreType,
> >>>>>>>> QueryableStoreTypes,
> >>>>>>>>>>>>>> HostInfo
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) state/keyValue/: KeyValueXXX classes, and also the
> >>> same
> >>>>>>>> for
> >>>>>>>>>>>>>>>> state/sessionWindow and state/timeWindow; *BUT* here we
> use
> >>>>>>>>>>>>>>>> ValueAndTimestamp as value types of those APIs directly,
> and
> >>>>>>>> also
> >>>>>>>>>>>>>>>> TimestampedKeyValue/WindowStore would be deprecated.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) kstream/api/: KStream, KTable, GroupedKStream
> (renamed
> >>>>>>>> from
> >>>>>>>>>>>>>>>> KGroupedStream), GroupedKTable (renamed from
> KGroupedTable),
> >>>>>>>>>>>>>>>> TimeWindowedKStream, SessionWindowedKStream, GlobalKTable
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) kstream/operator/: Aggregator, ForeachFunction,
> ... ,
> >>>>>>>>> Merger
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> Grouped, Joined, Materialized, ... , Printed and
> >>> Transformer,
> >>>>>>>>>>>>>>>> TransformerSupplier.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) kstream/window/: Window, Windows, Windowed,
> >>>>>>> TimeWindows,
> >>>>>>>>>>>>>>>> SessionWindows, UnlimitedWindows, JoinWindows,
> >>>>>>> WindowedSerdes,
> >>>>>>>>>>>>>>>> Time/SessionWindowedSerialized/Deserializer.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) configure/: RocksDBConfigSetter, TopicNameExtractor,
> >>>>>>>>>>>>>>>> TimestampExtractor, UsePreviousTimeOnInvalidTimestamp,
> >>>>>>>>>>>>>>>> WallclockTimestampExtractor,
> ExtractRecordMetadataTimestamp,
> >>>>>>>>>>>>>>>> FailOnInvalidTimestamp, LogAndSkipOnInvalidTimestamp,
> >>>>>>>>>>>>>>> StateRestoreListener,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> (new) metadata/: StreamsMetadata, ThreadMetadata,
> >>>>>>> TaskMetadata,
> >>>>>>>>>>>>> TaskId
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Still, any xxx/internals packages are declared as inner
> >>>>>>>> classes,
> >>>>>>>>> but
> >>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>> xxx/yyy packages are declared as public APIs.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -------------------
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This is a very wild thought and I can totally understand
> if
> >>>>>>>>> people
> >>>>>>>>>>>>> feel
> >>>>>>>>>>>>>>>> this is too much since it definitely enlarges the scope of
> >>>>>>> this
> >>>>>>>>> KIP a
> >>>>>>>>>>>>>> lot
> >>>>>>>>>>>>>>>> :) just trying to play a devil's advocate here to do major
> >>>>>>>>>>>>> refactoring
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> avoid renaming Processor classes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:51 PM Matthias J. Sax <
> >>>>>>>>>>>>> matthias@confluent.io
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think `RecordProcessor` is a good name.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 6/21/19 5:09 PM, John Roesler wrote:
> >>>>>>>>>>>>>>>>>> After kicking the naming around a bit more, it seems
> like
> >>>>>>> any
> >>>>>>>>>>>>>> package
> >>>>>>>>>>>>>>>>>> name change is a bit "weird" because it fragments the
> >>>>>>> package
> >>>>>>>>> and
> >>>>>>>>>>>>>>>>>> directory structure. If we can come up with a reasonable
> >>>>>>> name
> >>>>>>>>> for
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> interface after all, it seems like the better choice.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The real challenge is that the existing name "Processor"
> >>>>>>>> seems
> >>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>> about perfect. In picking a new name, we need to
> consider
> >>>>>>> the
> >>>>>>>>>>>>>>> ultimate
> >>>>>>>>>>>>>>>>>> state, after the deprecation period, when we entirely
> >>>>>>> remove
> >>>>>>>>>>>>>>>>>> Processor. In this context, TypedProcessor seems a
> little
> >>>>>>> odd
> >>>>>>>>> to
> >>>>>>>>>>>>>> me,
> >>>>>>>>>>>>>>>>>> because it seems to imply that there should also be an
> >>>>>>>> "untyped
> >>>>>>>>>>>>>>>>>> processor".
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> After kicking around a few other ideas, what does
> everyone
> >>>>>>>>> think
> >>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>> "RecordProcessor"? I _think_ maybe it stands on its own
> >>>>>>> just
> >>>>>>>>>>>>> fine,
> >>>>>>>>>>>>>>>>>> because it's a thing that processes... records?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If others agree with this, I can change the proposal to
> >>>>>>>>>>>>>>> RecordProcessor.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 6:42 PM John Roesler <
> >>>>>>>>> john@confluent.io>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I've updated the KIP with the feedback so far.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The naming question is still the biggest (only?)
> >>>>>>> outstanding
> >>>>>>>>>>>>>> issue.
> >>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>> would be good to hear some more thoughts on it.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> As we stand now, there's one vote for changing the
> >>> package
> >>>>>>>>> name
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> something like 'typedprocessor', one for changing the
> >>>>>>>>> interface
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> TypedProcessor (as in the PoC), and one for just
> changing
> >>>>>>>> the
> >>>>>>>>>>>>>>>>>>> Processor interface in-place, breaking source
> >>>>>>> compatibility.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> How can we resolve this decision?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 5:44 PM John Roesler <
> >>>>>>>>> john@confluent.io
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the feedback, Guozhang and Matthias,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regarding motivation: I'll update the wiki. Briefly:
> >>>>>>>>>>>>>>>>>>>> * Any processor can benefit. Imagine a pure user of
> the
> >>>>>>>>>>>>>>> ProcessorAPI
> >>>>>>>>>>>>>>>>>>>> who has very complex processing logic. I have seen
> >>>>>>> several
> >>>>>>>>>>>>>>> processor
> >>>>>>>>>>>>>>>>>>>> implementation that are hundreds of lines long and
> call
> >>>>>>>>>>>>>>>>>>>> `context.forward` in many different locations and
> >>>>>>> branches.
> >>>>>>>>> In
> >>>>>>>>>>>>>>> such an
> >>>>>>>>>>>>>>>>>>>> implementation, it would be very easy to have a bug
> in a
> >>>>>>>>> rarely
> >>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>> branch that forwards the wrong kind of value. This
> would
> >>>>>>>>>>>>>>> structurally
> >>>>>>>>>>>>>>>>>>>> prevent that from happening.
> >>>>>>>>>>>>>>>>>>>> * Also, anyone who heavily uses the ProcessorAPI would
> >>>>>>>> likely
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>> developed helper methods to wire together processors,
> >>>>>>> just
> >>>>>>>> as
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>> in the DSL implementation. This change would enable
> them
> >>>>>>> to
> >>>>>>>>>>>>>> ensure
> >>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>> compile time that they are actually wiring together
> >>>>>>>>> compatible
> >>>>>>>>>>>>>>> types.
> >>>>>>>>>>>>>>>>>>>> This was actually _my_ original motivation, since I
> >>> found
> >>>>>>>> it
> >>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>>> difficult and time consuming to follow the Streams DSL
> >>>>>>>>> internal
> >>>>>>>>>>>>>>>>>>>> builders.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regarding breaking the source compatibility of
> >>>>>>> Processor: I
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>> _love_ to side-step the naming problem, but I really
> >>>>>>> don't
> >>>>>>>>> know
> >>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>> it's excusable to break compatibility. I suspect that
> >>> our
> >>>>>>>>>>>>> oldest
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> dearest friends are using the ProcessorAPI in some
> form
> >>>>>>> or
> >>>>>>>>>>>>>> another,
> >>>>>>>>>>>>>>>>>>>> and all their source code would break. It sucks to
> have
> >>>>>>> to
> >>>>>>>>>>>>>> create a
> >>>>>>>>>>>>>>>>>>>> whole new interface to get around this, but it feels
> >>> like
> >>>>>>>> the
> >>>>>>>>>>>>>> right
> >>>>>>>>>>>>>>>>>>>> thing to do. Would be nice to get even more feedback
> on
> >>>>>>>> this
> >>>>>>>>>>>>>> point,
> >>>>>>>>>>>>>>>>>>>> though.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regarding the types of stores, as I said in my
> response
> >>>>>>> to
> >>>>>>>>>>>>>> Sophie,
> >>>>>>>>>>>>>>>>>>>> it's not an issue.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regarding the change to StreamsBuilder, it doesn't pin
> >>>>>>> the
> >>>>>>>>>>>>> types
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> any way, since all the types are bounded by Object
> only,
> >>>>>>>> and
> >>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>> no extra constraints between arguments (each type is
> >>> used
> >>>>>>>>> only
> >>>>>>>>>>>>>>> once in
> >>>>>>>>>>>>>>>>>>>> one argument). But maybe I missed the point you were
> >>>>>>> asking
> >>>>>>>>>>>>>> about.
> >>>>>>>>>>>>>>>>>>>> Since the type takes generic paramters, we should
> allow
> >>>>>>>> users
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>> in parameterized arguments. Otherwise, they would
> _have
> >>>>>>> to_
> >>>>>>>>>>>>> give
> >>>>>>>>>>>>>>> us a
> >>>>>>>>>>>>>>>>>>>> raw type, and they would be forced to get a "rawtyes"
> >>>>>>>> warning
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> compiler. So, it's our obligation in any API that
> >>>>>>> accepts a
> >>>>>>>>>>>>>>>>>>>> parameterized-type parameter to allow people to
> actually
> >>>>>>>>> pass a
> >>>>>>>>>>>>>>>>>>>> parameterized type, even if we don't actually use the
> >>>>>>>>>>>>> parameters.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The naming question is a complex one, as I took pains
> to
> >>>>>>>>> detail
> >>>>>>>>>>>>>>>>>>>> previously. Please don't just pick out one minor
> point,
> >>>>>>>> call
> >>>>>>>>> it
> >>>>>>>>>>>>>>> weak,
> >>>>>>>>>>>>>>>>>>>> and then claim that it invalidates the whole
> decision. I
> >>>>>>>>> don't
> >>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>> there's a clear best choice, so I'm more than happy
> for
> >>>>>>>>> someone
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> advocate for renaming the class instead of the
> package.
> >>>>>>> Can
> >>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>> provide some reasons why you think that would be
> better?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regarding the deprecated methods, you're absolutely
> >>>>>>> right.
> >>>>>>>>> I'll
> >>>>>>>>>>>>>>>> update the KIP.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks again for all the feedback!
> >>>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax <
> >>>>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Just want to second what Sophie said about the
> stores.
> >>>>>>> The
> >>>>>>>>>>>>> type
> >>>>>>>>>>>>>>> of a
> >>>>>>>>>>>>>>>>>>>>> used stores is completely independent of input/output
> >>>>>>>> types.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> This related to change `addGlobalStore()` method. Why
> >>> do
> >>>>>>>> you
> >>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> pin
> >>>>>>>>>>>>>>>>>>>>> the types? In fact, people request the ability to
> >>>>>>> filter()
> >>>>>>>>> and
> >>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>> even map() the data before they are put into the
> global
> >>>>>>>>> store.
> >>>>>>>>>>>>>>>> Limiting
> >>>>>>>>>>>>>>>>>>>>> the types seems to be a step backward here?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Also, the pack name is questionable.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This wouldn't be the first project to do something
> >>> like
> >>>>>>>>>>>>> this...
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Not a strong argument. I would actually propose to
> not
> >>>>>>> a a
> >>>>>>>>> new
> >>>>>>>>>>>>>>>> package,
> >>>>>>>>>>>>>>>>>>>>> but just a new class `TypedProcessor`.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> For `ProcessorContext#forward` methods -- some of
> those
> >>>>>>>>>>>>> methods
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>> already deprecated. While the will still be affected,
> >>> it
> >>>>>>>>> would
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> worth
> >>>>>>>>>>>>>>>>>>>>> to mark them as deprecated in the wiki page, too.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> @Guozhang: I dont' think we should break source
> >>>>>>>>> compatibility
> >>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>> minor
> >>>>>>>>>>>>>>>>>>>>> release.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On 6/20/19 1:43 PM, Guozhang Wang wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks for KIP! I've a few comments below:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> 1. So far the "Motivation" section is very general,
> >>> and
> >>>>>>>> the
> >>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>> concrete
> >>>>>>>>>>>>>>>>>>>>>> example that I have in mind is
> >>>>>>>> `TransformValues#punctuate`.
> >>>>>>>>>>>>> Do
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> have any
> >>>>>>>>>>>>>>>>>>>>>> other concrete issues that drive this KIP? If not
> then
> >>>>>>> I
> >>>>>>>>> feel
> >>>>>>>>>>>>>>>> better to
> >>>>>>>>>>>>>>>>>>>>>> narrow the scope of this KIP to:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> 1.a) modifying ProcessorContext only with the output
> >>>>>>>> types
> >>>>>>>>> on
> >>>>>>>>>>>>>>>> forward.
> >>>>>>>>>>>>>>>>>>>>>> 1.b) modifying Transformer signature to have
> generics
> >>>>>>> of
> >>>>>>>>>>>>>>>> ProcessorContext,
> >>>>>>>>>>>>>>>>>>>>>> and then lift the restricting of not using
> punctuate:
> >>>>>>> if
> >>>>>>>>> user
> >>>>>>>>>>>>>> did
> >>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>> follow the enforced typing and just code without
> >>>>>>>> generics,
> >>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>> will get
> >>>>>>>>>>>>>>>>>>>>>> warning at compile time and get run-time error if
> they
> >>>>>>>>>>>>> forward
> >>>>>>>>>>>>>>>> wrong-typed
> >>>>>>>>>>>>>>>>>>>>>> records, which I think would be acceptable.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I feel this would be a good solution for this
> specific
> >>>>>>>>> issue;
> >>>>>>>>>>>>>>>> again, feel
> >>>>>>>>>>>>>>>>>>>>>> free to update the wiki page with other known issues
> >>>>>>> that
> >>>>>>>>>>>>>> cannot
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> resolved.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> 2. If, we want to go with the current scope then my
> >>>>>>> next
> >>>>>>>>>>>>>> question
> >>>>>>>>>>>>>>>> would be,
> >>>>>>>>>>>>>>>>>>>>>> how much breakage we would introducing if we just
> >>>>>>> modify
> >>>>>>>>> the
> >>>>>>>>>>>>>>>> Processor
> >>>>>>>>>>>>>>>>>>>>>> signature directly? My feeling is that DSL users
> would
> >>>>>>> be
> >>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>> likely not
> >>>>>>>>>>>>>>>>>>>>>> affected and PAPI users only need to modify a few
> >>> lines
> >>>>>>>> on
> >>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>> declaration. I feel it worth doing some research on
> >>>>>>> this
> >>>>>>>>> part
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>> decide if we really want to bite the bullet of
> >>>>>>> duplicated
> >>>>>>>>>>>>>>> Processor
> >>>>>>>>>>>>>>>> /
> >>>>>>>>>>>>>>>>>>>>>> ProcessorSupplier classes for maintaining
> >>>>>>> compatibility.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 19, 2019 at 12:21 PM John Roesler <
> >>>>>>>>>>>>>> john@confluent.io
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> In response to the feedback so far, I changed the
> >>>>>>>> package
> >>>>>>>>>>>>> name
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>> `processor2` to `processor.generic`.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler <
> >>>>>>>>>>>>>> john@confluent.io
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback, Sophie!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I actually felt a little uneasy when I wrote that
> >>>>>>>> remark,
> >>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>>>>>>>>>> not restricted at all in the API, it's just
> >>> available
> >>>>>>>> to
> >>>>>>>>>>>>> you
> >>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>> choose to give your stores and context the same
> >>>>>>>>> parameters.
> >>>>>>>>>>>>>>> So, I
> >>>>>>>>>>>>>>>>>>>>>>>> think your use case is valid, and also perfectly
> >>>>>>>>>>>>> permissable
> >>>>>>>>>>>>>>>> under the
> >>>>>>>>>>>>>>>>>>>>>>>> current KIP. Sorry for sowing confusion on my own
> >>>>>>>>>>>>> discussion
> >>>>>>>>>>>>>>>> thread!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I'm not crazy about the package name, either. I
> went
> >>>>>>>> with
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>> because there's seemingly nothing special about
> the
> >>>>>>> new
> >>>>>>>>>>>>>> package
> >>>>>>>>>>>>>>>> except
> >>>>>>>>>>>>>>>>>>>>>>>> that it can't have the same name as the old one.
> >>>>>>>>> Otherwise,
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> existing "processor" and "Processor" names for the
> >>>>>>>>> package
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>>>> are perfectly satisfying. Rather than pile on
> >>>>>>>> additional
> >>>>>>>>>>>>>>>> semantics, it
> >>>>>>>>>>>>>>>>>>>>>>>> seemed cleaner to just add a number to the package
> >>>>>>>> name.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> This wouldn't be the first project to do something
> >>>>>>> like
> >>>>>>>>>>>>>> this...
> >>>>>>>>>>>>>>>> Apache
> >>>>>>>>>>>>>>>>>>>>>>>> Commons, for example, has added a "2" to the end
> of
> >>>>>>>> some
> >>>>>>>>> of
> >>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>>>>> packages for exactly the same reason.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I'm open to any suggestions. For example, we could
> >>> do
> >>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.streams.typedprocessor.Processor
> or
> >>>>>>>>>>>>>>>>>>>>>>>>
> org.apache.kafka.streams.processor.typed.Processor ,
> >>>>>>>>> which
> >>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>>> just about the same effect. One microscopic
> thought
> >>>>>>> is
> >>>>>>>>>>>>> that,
> >>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>>>> there's another interface in the "processor"
> package
> >>>>>>>> that
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> wish
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> do the same thing to, would _could_ pile it in to
> >>>>>>>>>>>>>> "processor2",
> >>>>>>>>>>>>>>>> but we
> >>>>>>>>>>>>>>>>>>>>>>>> couldn't do the same if we use a package that has
> >>>>>>>> "typed"
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> name,
> >>>>>>>>>>>>>>>>>>>>>>>> unless that change is _also_ related to types in
> >>> some
> >>>>>>>>> way.
> >>>>>>>>>>>>>> But
> >>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>> seems like a very minor concern.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> What's your preference?
> >>>>>>>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie
> Blee-Goldman
> >>> <
> >>>>>>>>>>>>>>>> sophie@confluent.io>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hey John, thanks for writing this up! I like the
> >>>>>>>>> proposal
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>>>>>> point that I think may be too restrictive:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> "A processor that happens to use a typed store is
> >>>>>>>>> actually
> >>>>>>>>>>>>>>>> emitting the
> >>>>>>>>>>>>>>>>>>>>>>>>> same types that it is storing."
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I can imagine someone could want to leverage this
> >>>>>>> new
> >>>>>>>>> type
> >>>>>>>>>>>>>>> safety
> >>>>>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>>>>>> also limiting how they can interact with/use
> their
> >>>>>>>>> store.
> >>>>>>>>>>>>> As
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>> (admittedly
> >>>>>>>>>>>>>>>>>>>>>>>>> contrived) example, say you have an input stream
> of
> >>>>>>>>>>>>>> purchases
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> certain
> >>>>>>>>>>>>>>>>>>>>>>>>> type (entertainment, food, etc), and on seeing a
> >>> new
> >>>>>>>>>>>>> record
> >>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>> want to
> >>>>>>>>>>>>>>>>>>>>>>>>> output how many types of purchase a shopper has
> >>> made
> >>>>>>>>> more
> >>>>>>>>>>>>>>> than 5
> >>>>>>>>>>>>>>>>>>>>>>> purchases
> >>>>>>>>>>>>>>>>>>>>>>>>> of in the last month. Your state store will
> >>> probably
> >>>>>>>> be
> >>>>>>>>>>>>>>> holding
> >>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>> complicated PurchaseHistory object (keyed by
> user),
> >>>>>>>> but
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>> output is
> >>>>>>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>>>> a <User, Long>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I'm also not crazy about "processor2" as the
> >>> package
> >>>>>>>>> name
> >>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>> not sure
> >>>>>>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>>>>> a better one would be though (something with
> >>>>>>> "typed"?)
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler <
> >>>>>>>>>>>>>>> john@confluent.io>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to propose KIP-478 (
> >>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/x/2SkLBw
> >>>>>>>>>>>>>>>>>>>>>>>>>> ).
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> This proposal would add output type bounds to
> the
> >>>>>>>>>>>>> Processor
> >>>>>>>>>>>>>>>> interface
> >>>>>>>>>>>>>>>>>>>>>>>>>> in Kafka Streams, which enables static checking
> of
> >>>>>>> a
> >>>>>>>>>>>>> number
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>>>>>>>>>> properties:
> >>>>>>>>>>>>>>>>>>>>>>>>>> * A processor B that consumes the output of
> >>>>>>>> processor A
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>> expecting the same types that processor A
> >>> produces.
> >>>>>>>>>>>>>>>>>>>>>>>>>> * A processor that happens to use a typed store
> is
> >>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>> emitting
> >>>>>>>>>>>>>>>>>>>>>>>>>> the same types that it is storing.
> >>>>>>>>>>>>>>>>>>>>>>>>>> * A processor is simply forwarding the expected
> >>>>>>> types
> >>>>>>>>> in
> >>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>> paths.
> >>>>>>>>>>>>>>>>>>>>>>>>>> * Processors added via the Streams DSL, which
> are
> >>>>>>> not
> >>>>>>>>>>>>>>> permitted
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>> forward results at all are statically prevented
> >>>>>>> from
> >>>>>>>>>>>>> doing
> >>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>> by the
> >>>>>>>>>>>>>>>>>>>>>>>>>> compiler
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Internally, we can use the above properties to
> >>>>>>>> achieve
> >>>>>>>>> a
> >>>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>> higher
> >>>>>>>>>>>>>>>>>>>>>>>>>> level of confidence in the Streams DSL
> >>>>>>>> implementation's
> >>>>>>>>>>>>>>>> correctness.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Actually, while doing the POC, I found a few
> bugs
> >>>>>>> and
> >>>>>>>>>>>>>>> mistakes,
> >>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>> become structurally impossible with KIP-478.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Additionally, the stronger types dramatically
> >>>>>>> improve
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> self-documentation of our Streams internal
> >>>>>>>>>>>>> implementations,
> >>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>> makes it much easier for new contributors to
> ramp
> >>>>>>> up
> >>>>>>>>> with
> >>>>>>>>>>>>>>>> confidence.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks so much for your consideration!
> >>>>>>>>>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message