kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eno Thereska <eno.there...@gmail.com>
Subject Re: [DISCUSS] KIP-114: KTable materialization and improved semantics
Date Fri, 21 Apr 2017 08:11:20 GMT
Hi Guozhang,

Thanks for the feedback. Comments inline:

> 1. Regarding the user-facing semantics, I thought we will claim that
> "KTables generated from functions that do NOT specify a table name will NOT
> be queryable"; but it seems you're proposing to claim it "may not possible
> to be queryable", i.e. if users happen to know the internal name if it is
> materialized, she can still query it. I feel its potential benefits are
> well overwhelmed by the confusion it may introduce. So I'd suggest we just
> be strict and say "no store name, not queryable".

Sure. "No store name, not queryable" sounds fine. If the user is brave and digs deep they will be able to query these stores that are always created (like when we do aggregates), but I agree that there is no reason we need to make a promise to them if they don't provide a name. I'll change the wording.

> 
> 2. Is there a difference between "calling the overloaded function with
> store name, but specify the value as null" and "calling the overloaded
> function without store name"? I thought they will be implemented the same
> way. But after reading through the wiki I'm not sure. So just clarifying.
> 

There is no difference. I'll clarify.


> 3. Personally I'm still a bit preferring renaming "KTable#toStream" to sth.
> like "KTable#getChangelog()" or "#toChangelog", since to me it feels more
> understandable from user's point of view. WDPT?
> 

So I left this out of this KIP, since it's not directly related to the scope. Perhaps we can do it in a cleanup KIP?

Thanks
Eno


> 
> Guozhang
> 
> 
> On Tue, Apr 11, 2017 at 11:53 AM, Matthias J. Sax <matthias@confluent.io>
> wrote:
> 
>> +1
>> 
>> On 4/11/17 10:34 AM, Eno Thereska wrote:
>>> Hi Matthias,
>>> 
>>> 
>>>> On 11 Apr 2017, at 09:41, Matthias J. Sax <matthias@confluent.io>
>> wrote:
>>>> 
>>>> Not sure, if we are on the same page already?
>>>> 
>>>>> "A __store__ can be queryable whether is't materialized or not"
>>>> 
>>>> This does not make sense -- there is nothing like a non-materialized
>>>> store -- only non-materialized KTables.
>>> 
>>> Yes, there are stores that are simple views, i.e., non-materialized.
>> Damian has such a prototype for Global Tables (it didn't go into trunk).
>>> It's still a store, e.g., a KeyValueStore, but when you do a get() it
>> recomputes the result on the fly (e.g., it applies a filter).
>>> 
>>> Eno
>>> 
>>>> 
>>>>> "Yes, there is nothing that will prevent users from querying
>>>> internally generated stores, but they cannot assume a store will
>>>> necessarily be queryable."
>>>> 
>>>> That is what I disagree on. Stores should be queryable all the time.
>>>> 
>>>> Furthermore, we should have all non-materialized KTables to be
>>>> queryable, too.
>>>> 
>>>> 
>>>> Or maybe there is just some missunderstand going as, and there is some
>>>> mix-up between "store" and "KTable"
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> On 4/11/17 9:34 AM, Eno Thereska wrote:
>>>>> Hi Matthias,
>>>>> 
>>>>> See my note: "A store can be queryable whether it's materialized or
>> not". I think we're on the same page. Stores with an internal name are also
>> queryable.
>>>>> 
>>>>> I'm just pointing out that. although that is the case today and with
>> this KIP, I don't think we have an obligation to make stores with internal
>> names queryable in the future. However, that is a discussion for a future
>> point.
>>>>> 
>>>>> Eno
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 11 Apr 2017, at 08:56, Matthias J. Sax <matthias@confluent.io>
>> wrote:
>>>>>> 
>>>>>> +1 on including GlobalKTable
>>>>>> 
>>>>>> But I am not sure about the materialization / queryable question. For
>>>>>> full consistency, all KTables should be queryable nevertheless if they
>>>>>> are materialized or not. -- Maybe this is a second step though (even
>> if
>>>>>> I would like to get this done right away)
>>>>>> 
>>>>>> If we don't want all KTables to be queryable, ie, only those KTables
>>>>>> that are materialized, then we should have a clear definition about
>>>>>> this, and only allow to query stores, the user did specify a name for.
>>>>>> This will simply the reasoning for users, what stores are queryable
>> and
>>>>>> what not. Otherwise, we still end up confusing user.
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> On 4/11/17 8:23 AM, Damian Guy wrote:
>>>>>>> Eno, re: GlobalKTable - yeah that seems fine.
>>>>>>> 
>>>>>>> On Tue, 11 Apr 2017 at 14:18 Eno Thereska <eno.thereska@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>>> About GlobalKTables, I suppose there is no reason why they cannot
>> also use
>>>>>>>> this KIP for consistency, e.g., today you have:
>>>>>>>> 
>>>>>>>> public <K, V> GlobalKTable<K, V> globalTable(final Serde<K>
>> keySerde,
>>>>>>>>                                           final Serde<V> valSerde,
>>>>>>>>                                           final String topic,
>>>>>>>>                                           final String storeName)
>>>>>>>> 
>>>>>>>> For consistency with the KIP you could also have an overload
>> without the
>>>>>>>> store name, for people who want to construct a global ktable, but
>> don't
>>>>>>>> care about querying it directly:
>>>>>>>> 
>>>>>>>> public <K, V> GlobalKTable<K, V> globalTable(final Serde<K>
>> keySerde,
>>>>>>>>                                           final Serde<V> valSerde,
>>>>>>>>                                           final String topic)
>>>>>>>> 
>>>>>>>> Damian, what do you think? I'm thinking of adding this to KIP.
>> Thanks to
>>>>>>>> Michael for bringing it up.
>>>>>>>> 
>>>>>>>> Eno
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 11 Apr 2017, at 06:13, Eno Thereska <eno.thereska@gmail.com>
>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Michael, comments inline:
>>>>>>>>> 
>>>>>>>>>> On 11 Apr 2017, at 03:25, Michael Noll <michael@confluent.io>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks for the updates, Eno!
>>>>>>>>>> 
>>>>>>>>>> In addition to what has already been said:  We should also
>> explicitly
>>>>>>>>>> mention that this KIP is not touching GlobalKTable.  I'm sure
>> that some
>>>>>>>>>> users will throw KTable and GlobalKTable into one conceptual
>> "it's all
>>>>>>>>>> tables!" bucket and then wonder how the KIP might affect global
>> tables.
>>>>>>>>> 
>>>>>>>>> Good point, I'll add.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Damian wrote:
>>>>>>>>>>> I think if no store name is provided users would still be able
>> to query
>>>>>>>>>> the
>>>>>>>>>>> store, just the store name would be some internally generated
>> name.
>>>>>>>> They
>>>>>>>>>>> would be able to discover those names via the IQ API.
>>>>>>>>>> 
>>>>>>>>>> I, too, think that users should be able to query a store even if
>> its
>>>>>>>> name
>>>>>>>>>> was internally generated.  After all, the data is already there /
>>>>>>>>>> materialized.
>>>>>>>>> 
>>>>>>>>> Yes, there is nothing that will prevent users from querying
>> internally
>>>>>>>> generated stores, but they cannot
>>>>>>>>> assume a store will necessarily be queryable. So if it's there,
>> they can
>>>>>>>> query it. If it's not there, and they didn't
>>>>>>>>> provide a queryable name, they cannot complain and say "hey, where
>> is my
>>>>>>>> store". If they must absolutely be certain that
>>>>>>>>> a store is queryable, then they must provide a queryable name.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Damian wrote:
>>>>>>>>>>> I think for some stores it will make sense to not create a
>> physical
>>>>>>>>>> store, i.e.,
>>>>>>>>>>> for thinks like `filter`, as this will save the rocksdb
>> overhead. But i
>>>>>>>>>> guess that
>>>>>>>>>>> is more of an implementation detail.
>>>>>>>>>> 
>>>>>>>>>> I think it would help if the KIP would clarify what we'd do in
>> such a
>>>>>>>>>> case.  For example, if the user did not specify a store name for
>>>>>>>>>> `KTable#filter` -- would it be queryable?  If so, would this
>> imply we'd
>>>>>>>>>> always materialize the state store, or...?
>>>>>>>>> 
>>>>>>>>> I'll clarify in the KIP with some more examples. Materialization
>> will be
>>>>>>>> an internal concept. A store can be queryable whether it's
>> materialized or
>>>>>>>> not
>>>>>>>>> (e.g., through advanced implementations that compute the value of a
>>>>>>>> filter on a fly, rather than materialize the answer).
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Eno
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -Michael
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Apr 11, 2017 at 9:14 AM, Damian Guy <damian.guy@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Eno,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the update. I agree with what Matthias said. I wonder
>> if
>>>>>>>> the KIP
>>>>>>>>>>> should talk less about materialization and more about querying?
>> After
>>>>>>>> all,
>>>>>>>>>>> that is what is being provided from an end-users perspective.
>>>>>>>>>>> 
>>>>>>>>>>> I think if no store name is provided users would still be able to
>>>>>>>> query the
>>>>>>>>>>> store, just the store name would be some internally generated
>> name.
>>>>>>>> They
>>>>>>>>>>> would be able to discover those names via the IQ API
>>>>>>>>>>> 
>>>>>>>>>>> I think for some stores it will make sense to not create a
>> physical
>>>>>>>> store,
>>>>>>>>>>> i.e., for thinks like `filter`, as this will save the rocksdb
>>>>>>>> overhead. But
>>>>>>>>>>> i guess that is more of an implementation detail.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Damian
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 11 Apr 2017 at 00:36 Eno Thereska <
>> eno.thereska@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>> 
>>>>>>>>>>>>> However, this still forces users, to provide a name for store
>> that we
>>>>>>>>>>>>> must materialize, even if users are not interested in querying
>> the
>>>>>>>>>>>>> stores. Thus, I would like to have overloads for all currently
>>>>>>>> existing
>>>>>>>>>>>>> methods having mandatory storeName paremeter, with overloads,
>> that do
>>>>>>>>>>>>> not require the storeName parameter.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Oh yeah, absolutely, this is part of the KIP. I guess I didn't
>> make it
>>>>>>>>>>>> clear, I'll clarify.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Eno
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 10 Apr 2017, at 16:00, Matthias J. Sax <
>> matthias@confluent.io>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for pushing this KIP Eno.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The update give a very clear description about the scope, that
>> is
>>>>>>>> super
>>>>>>>>>>>>> helpful for the discussion!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - To put it into my own words, the KIP focus is on enable to
>> query
>>>>>>>> all
>>>>>>>>>>>>> KTables.
>>>>>>>>>>>>> ** The ability to query a store is determined by providing a
>> name for
>>>>>>>>>>>>> the store.
>>>>>>>>>>>>> ** At the same time, providing a name -- and thus making a
>> store
>>>>>>>>>>>>> queryable -- does not say anything about an actual
>> materialization
>>>>>>>> (ie,
>>>>>>>>>>>>> being queryable and being materialized are orthogonal).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I like this overall a lot. However, I would go one step
>> further.
>>>>>>>> Right
>>>>>>>>>>>>> now, you suggest to add new overload methods that allow users
>> to
>>>>>>>>>>> specify
>>>>>>>>>>>>> a storeName -- if `null` is provided and the store is not
>>>>>>>> materialized,
>>>>>>>>>>>>> we ignore it completely -- if `null` is provided but the store
>> must
>>>>>>>> be
>>>>>>>>>>>>> materialized we generate a internal name. So far so good.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, this still forces users, to provide a name for store
>> that we
>>>>>>>>>>>>> must materialize, even if users are not interested in querying
>> the
>>>>>>>>>>>>> stores. Thus, I would like to have overloads for all currently
>>>>>>>> existing
>>>>>>>>>>>>> methods having mandatory storeName paremeter, with overloads,
>> that do
>>>>>>>>>>>>> not require the storeName parameter.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Otherwise, we would still have some methods which optional
>> storeName
>>>>>>>>>>>>> parameter and other method with mandatory storeName parameter
>> --
>>>>>>>> thus,
>>>>>>>>>>>>> still some inconsistency.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4/9/17 8:35 AM, Eno Thereska wrote:
>>>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I've now done a V2 of the KIP, that hopefully addresses the
>> feedback
>>>>>>>>>>> in
>>>>>>>>>>>> this discussion thread:
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 114%3A+KTable+materialization+and+improved+semantics
>>>>>>>>>>>> <
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 114:+KTable+materialization+and+improved+semantics>.
>>>>>>>>>>>> Notable changes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - clearly outline what is in the scope of the KIP and what is
>> not.
>>>>>>>> We
>>>>>>>>>>>> ran into the issue where lots of useful, but somewhat tangential
>>>>>>>>>>>> discussions came up on interactive queries, declarative DSL
>> etc. The
>>>>>>>>>>> exact
>>>>>>>>>>>> scope of this KIP is spelled out.
>>>>>>>>>>>>>> - decided to go with overloaded methods, not .materialize(),
>> to stay
>>>>>>>>>>>> within the spirit of the current declarative DSL.
>>>>>>>>>>>>>> - clarified the depreciation plan
>>>>>>>>>>>>>> - listed part of the discussion we had under rejected
>> alternatives
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If you have any further feedback on this, let's continue on
>> this
>>>>>>>>>>> thread.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>> Eno
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 1 Feb 2017, at 09:04, Eno Thereska <
>> eno.thereska@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks everyone! I think it's time to do a V2 on the KIP so
>> I'll do
>>>>>>>>>>>> that and we can see how it looks and continue the discussion
>> from
>>>>>>>> there.
>>>>>>>>>>>> Stay tuned.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Eno
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 30 Jan 2017, at 17:23, Matthias J. Sax <
>> matthias@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think Eno's separation is very clear and helpful. In
>> order to
>>>>>>>>>>>>>>>> streamline this discussion, I would suggest we focus back
>> on point
>>>>>>>>>>> (1)
>>>>>>>>>>>>>>>> only, as this is the original KIP question.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Even if I started to DSL design discussion somehow, because
>> I
>>>>>>>>>>> thought
>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> might be helpful to resolve both in a single shot, I feel
>> that we
>>>>>>>>>>> have
>>>>>>>>>>>>>>>> too many options about DSL design and we should split it up
>> in two
>>>>>>>>>>>>>>>> steps. This will have the disadvantage that we will change
>> the API
>>>>>>>>>>>>>>>> twice, but still, I think it will be a more focused
>> discussion.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I just had another look at the KIP, an it proposes 3
>> changes:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1. add .materialized() -> IIRC it was suggested to name this
>>>>>>>>>>>>>>>> .materialize() though (can you maybe update the KIP Eno?)
>>>>>>>>>>>>>>>> 2. remove print(), writeAsText(), and foreach()
>>>>>>>>>>>>>>>> 3. rename toStream() to toKStream()
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I completely agree with (2) -- not sure about (3) though
>> because
>>>>>>>>>>>>>>>> KStreamBuilder also hast .stream() and .table() as methods.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> However, we might want to introduce a KStream#toTable() --
>> this
>>>>>>>> was
>>>>>>>>>>>>>>>> requested multiple times -- might also be part of a
>> different KIP.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thus, we end up with (1). I would suggest to do a step
>> backward
>>>>>>>> here
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> instead of a discussion how to express the changes in the
>> DSL (new
>>>>>>>>>>>>>>>> overload, new methods...) we should discuss what the actual
>> change
>>>>>>>>>>>>>>>> should be. Like (1) materialize all KTable all the time (2)
>> all
>>>>>>>> the
>>>>>>>>>>>> user
>>>>>>>>>>>>>>>> to force a materialization to enable querying the KTable
>> (3) allow
>>>>>>>>>>> for
>>>>>>>>>>>>>>>> queryable non-materialized KTable.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On more question is, if we want to allow a user-forced
>>>>>>>>>>> materialization
>>>>>>>>>>>>>>>> only as as local store without changelog, or both (together
>> /
>>>>>>>>>>>>>>>> independently)? We got some request like this already.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 1/30/17 3:50 AM, Jan Filipiak wrote:
>>>>>>>>>>>>>>>>> Hi Eno,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> thanks for putting into different points. I want to put a
>> few
>>>>>>>>>>> remarks
>>>>>>>>>>>>>>>>> inline.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best Jan
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 30.01.2017 12:19, Eno Thereska wrote:
>>>>>>>>>>>>>>>>>> So I think there are several important discussion threads
>> that
>>>>>>>> are
>>>>>>>>>>>>>>>>>> emerging here. Let me try to tease them apart:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1. inconsistency in what is materialized and what is not,
>> what
>>>>>>>> is
>>>>>>>>>>>>>>>>>> queryable and what is not. I think we all agree there is
>> some
>>>>>>>>>>>>>>>>>> inconsistency there and this will be addressed with any
>> of the
>>>>>>>>>>>>>>>>>> proposed approaches. Addressing the inconsistency is the
>> point
>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> original KIP.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2. the exact API for materializing a KTable. We can
>> specify 1) a
>>>>>>>>>>>>>>>>>> "store name" (as we do today) or 2) have a
>> ".materialize[d]"
>>>>>>>> call
>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>> 3) get a handle from a KTable ".getQueryHandle" or 4)
>> have a
>>>>>>>>>>> builder
>>>>>>>>>>>>>>>>>> construct. So we have discussed 4 options. It is
>> important to
>>>>>>>>>>>> remember
>>>>>>>>>>>>>>>>>> in this discussion that IQ is not designed for just local
>>>>>>>> queries,
>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> also for distributed queries. In all cases an identifying
>>>>>>>> name/id
>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> needed for the store that the user is interested in
>> querying. So
>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> end up with a discussion on who provides the name, the
>> user (as
>>>>>>>>>>> done
>>>>>>>>>>>>>>>>>> today) or if it is generated automatically (as Jan
>> suggests, as
>>>>>>>> I
>>>>>>>>>>>>>>>>>> understand it). If it is generated automatically we need
>> a way
>>>>>>>> to
>>>>>>>>>>>>>>>>>> expose these auto-generated names to the users and link
>> them to
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> KTables they care to query.
>>>>>>>>>>>>>>>>> Hi, the last sentence is what I currently arguing against.
>> The
>>>>>>>> user
>>>>>>>>>>>>>>>>> would never see a stringtype indentifier name or anything.
>> All he
>>>>>>>>>>>> gets
>>>>>>>>>>>>>>>>> is the queryHandle if he executes a get(K) that will be an
>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>> query get. with all the finding the right servers that
>> currently
>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>> copy of this underlying store stuff going on. The nice
>> part is
>>>>>>>> that
>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> someone retrieves a queryHandle, you know that you have to
>>>>>>>>>>>> materialized
>>>>>>>>>>>>>>>>> (if you are not already) as queries will be coming. Taking
>> away
>>>>>>>> the
>>>>>>>>>>>>>>>>> confusion mentioned in point 1 IMO.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 3. The exact boundary between the DSL, that is the
>> processing
>>>>>>>>>>>>>>>>>> language, and the storage/IQ queries, and how we jump
>> from one
>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> other. This is mostly for how we get a handle on a store
>> (so
>>>>>>>> it's
>>>>>>>>>>>>>>>>>> related to point 2), rather than for how we query the
>> store. I
>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> we all agree that we don't want to limit ways one can
>> query a
>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>> (e.g., using gets or range queries etc) and the query
>> APIs are
>>>>>>>> not
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the scope of the DSL.
>>>>>>>>>>>>>>>>> Does the IQ work with range currently? The range would
>> have to be
>>>>>>>>>>>>>>>>> started on all stores and then merged by maybe the client.
>> Range
>>>>>>>>>>>> force a
>>>>>>>>>>>>>>>>> flush to RocksDB currently so I am sure you would get a
>>>>>>>> performance
>>>>>>>>>>>> hit
>>>>>>>>>>>>>>>>> right there. Time-windows might be okay, but I am not sure
>> if the
>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>> version should offer the user range access.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 4. The nature of the DSL and whether its declarative
>> enough, or
>>>>>>>>>>>>>>>>>> flexible enough. Damian made the point that he likes the
>> builder
>>>>>>>>>>>>>>>>>> pattern since users can specify, per KTable, things like
>> caching
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> logging needs. His observation (as I understand it) is
>> that the
>>>>>>>>>>>>>>>>>> processor API (PAPI) is flexible but doesn't provide any
>> help at
>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> to users. The current DSL provides declarative
>> abstractions, but
>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>> not fine-grained enough. This point is much broader than
>> the
>>>>>>>> KIP,
>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> discussing it in this KIPs context is ok, since we don't
>> want to
>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>> small piecemeal changes and then realise we're not in the
>> spot
>>>>>>>> we
>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>> to be.
>>>>>>>>>>>>>>>>> This is indeed much broader. My guess here is that's why
>> both
>>>>>>>> API's
>>>>>>>>>>>>>>>>> exists and helping the users to switch back and forth
>> might be a
>>>>>>>>>>>> thing.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Feel free to pitch in if I have misinterpreted something.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Eno
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 30 Jan 2017, at 10:22, Jan Filipiak <
>>>>>>>> Jan.Filipiak@trivago.com
>>>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Eno,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have a really hard time understanding why we can't.
>> From my
>>>>>>>>>>> point
>>>>>>>>>>>>>>>>>>> of view everything could be super elegant DSL only +
>> public api
>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> the PAPI-people as already exist.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The above aproach implementing a .get(K) on KTable is
>> foolisch
>>>>>>>> in
>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>> opinion as it would be to late to know that
>> materialisation
>>>>>>>> would
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> required.
>>>>>>>>>>>>>>>>>>> But having an API that allows to indicate I want to
>> query this
>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>> and then wrapping the say table's processorname can work
>> out
>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>> really nice. The only obstacle I see is people not
>> willing to
>>>>>>>>>>> spend
>>>>>>>>>>>>>>>>>>> the additional time in implementation and just want a
>> quick
>>>>>>>> shot
>>>>>>>>>>>>>>>>>>> option to make it work.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> For me it would look like this:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> table =  builder.table()
>>>>>>>>>>>>>>>>>>> filteredTable = table.filter()
>>>>>>>>>>>>>>>>>>> rawHandle = table.getQueryHandle() // Does the
>> materialisation,
>>>>>>>>>>>>>>>>>>> really all names possible but id rather hide the
>> implication of
>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> materializes
>>>>>>>>>>>>>>>>>>> filteredTableHandle = filteredTable.getQueryHandle() //
>> this
>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> _not_ materialize again of course, the source or the
>> aggregator
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> stay the only materialized processors
>>>>>>>>>>>>>>>>>>> streams = new streams(builder)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> This middle part is highly flexible I could imagin to
>> force the
>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> todo something like this. This implies to the user that
>> his
>>>>>>>>>>> streams
>>>>>>>>>>>>>>>>>>> need to be running
>>>>>>>>>>>>>>>>>>> instead of propagating the missing initialisation back by
>>>>>>>>>>>> exceptions.
>>>>>>>>>>>>>>>>>>> Also if the users is forced to pass the appropriate
>> streams
>>>>>>>>>>>> instance
>>>>>>>>>>>>>>>>>>> back can change.
>>>>>>>>>>>>>>>>>>> I think its possible to build multiple streams out of
>> one
>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>> so it would be easiest to implement aswell. This is just
>> what I
>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>> had liked the most
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> streams.start();
>>>>>>>>>>>>>>>>>>> rawHandle.prepare(streams)
>>>>>>>>>>>>>>>>>>> filteredHandle.prepare(streams)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> later the users can do
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> V value = rawHandle.get(K)
>>>>>>>>>>>>>>>>>>> V value = filteredHandle.get(K)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> This could free DSL users from anything like storenames
>> and how
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> what to materialize. Can someone indicate what the
>> problem
>>>>>>>> would
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> implementing it like this.
>>>>>>>>>>>>>>>>>>> Yes I am aware that the current IQ API will not support
>>>>>>>> querying
>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>> KTableProcessorName instread of statestoreName. But I
>> think
>>>>>>>> that
>>>>>>>>>>>> had
>>>>>>>>>>>>>>>>>>> to change if you want it to be intuitive
>>>>>>>>>>>>>>>>>>> IMO you gotta apply the filter read time
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Looking forward to your opinions
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best Jan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> #DeathToIQMoreAndBetterConnectors
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 30.01.2017 10:42, Eno Thereska wrote:
>>>>>>>>>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> The inconsistency will be resolved, whether with
>> materialize
>>>>>>>> or
>>>>>>>>>>>>>>>>>>>> overloaded methods.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> With the discussion on the DSL & stores I feel we've
>> gone in a
>>>>>>>>>>>>>>>>>>>> slightly different tangent, which is worth discussing
>>>>>>>>>>> nonetheless.
>>>>>>>>>>>>>>>>>>>> We have entered into an argument around the scope of
>> the DSL.
>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>> DSL has been designed primarily for processing. The DSL
>> does
>>>>>>>> not
>>>>>>>>>>>>>>>>>>>> dictate ways to access state stores or what hind of
>> queries to
>>>>>>>>>>>>>>>>>>>> perform on them. Hence, I see the mechanism for
>> accessing
>>>>>>>>>>> storage
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>> decoupled from the DSL.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> We could think of ways to get store handles from part
>> of the
>>>>>>>>>>> DSL,
>>>>>>>>>>>>>>>>>>>> like the KTable abstraction. However, subsequent
>> queries will
>>>>>>>> be
>>>>>>>>>>>>>>>>>>>> store-dependent and not rely on the DSL, hence I'm not
>> sure we
>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>> any grand-convergence DSL-Store here. So I am arguing
>> that the
>>>>>>>>>>>>>>>>>>>> current way of getting a handle on state stores is fine.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>> Eno
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On 30 Jan 2017, at 03:56, Guozhang Wang <
>> wangguoz@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thinking loud here about the API options (materialize
>> v.s.
>>>>>>>>>>>> overloaded
>>>>>>>>>>>>>>>>>>>>> functions) and its impact on IQ:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. The first issue of the current DSL is that, there is
>>>>>>>>>>>>>>>>>>>>> inconsistency upon
>>>>>>>>>>>>>>>>>>>>> whether / how KTables should be materialized:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> a) in many cases the library HAS TO materialize
>> KTables no
>>>>>>>>>>>>>>>>>>>>> matter what,
>>>>>>>>>>>>>>>>>>>>> e.g. KStream / KTable aggregation resulted KTables,
>> and hence
>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> enforce
>>>>>>>>>>>>>>>>>>>>> users to provide store names and throw RTE if it is
>> null;
>>>>>>>>>>>>>>>>>>>>> b) in some other cases, the KTable can be materialized
>> or
>>>>>>>> not;
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> example in KStreamBuilder.table(), store names can be
>>>>>>>> nullable
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> in which
>>>>>>>>>>>>>>>>>>>>> case the KTable would not be materialized;
>>>>>>>>>>>>>>>>>>>>> c) in some other cases, the KTable will never be
>>>>>>>> materialized,
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> example KTable.filter() resulted KTables, and users
>> have no
>>>>>>>>>>>> options to
>>>>>>>>>>>>>>>>>>>>> enforce them to be materialized;
>>>>>>>>>>>>>>>>>>>>> d) this is related to a), where some KTables are
>> required to
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> materialized, but we do not enforce users to provide a
>> state
>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>>>>> name,
>>>>>>>>>>>>>>>>>>>>> e.g. KTables involved in joins; a RTE will be thrown
>> not
>>>>>>>>>>>>>>>>>>>>> immediately but
>>>>>>>>>>>>>>>>>>>>> later in this case.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2. The second issue is related to IQ, where state
>> stores are
>>>>>>>>>>>>>>>>>>>>> accessed by
>>>>>>>>>>>>>>>>>>>>> their state stores; so only those KTable's that have
>>>>>>>>>>>> user-specified
>>>>>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>>>>>> stores will be queryable. But because of 1) above, many
>>>>>>>> stores
>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>> not be
>>>>>>>>>>>>>>>>>>>>> interested to users for IQ but they still need to
>> provide a
>>>>>>>>>>>>>>>>>>>>> (dummy?) state
>>>>>>>>>>>>>>>>>>>>> store name for them; while on the other hand users
>> cannot
>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>> some state
>>>>>>>>>>>>>>>>>>>>> stores, e.g. the ones generated by KTable.filter() as
>> there
>>>>>>>> is
>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>> APIs for
>>>>>>>>>>>>>>>>>>>>> them to specify a state store name.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 3. We are aware from user feedbacks that such backend
>> details
>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>>>> better be abstracted away from the DSL layer, where app
>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> just focus on processing logic, while state stores
>> along with
>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>> changelogs etc would better be in a different
>> mechanism; same
>>>>>>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>>>>> have been discussed for serdes / windowing triggers as
>> well.
>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>> serdes
>>>>>>>>>>>>>>>>>>>>> specifically, we had a very long discussion about it
>> and
>>>>>>>>>>>> concluded
>>>>>>>>>>>>>>>>>>>>> that, at
>>>>>>>>>>>>>>>>>>>>> least in Java7, we cannot completely abstract serde
>> away in
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> DSL, so we
>>>>>>>>>>>>>>>>>>>>> choose the other extreme to enforce users to be
>> completely
>>>>>>>>>>> aware
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> serde requirements when some KTables may need to be
>>>>>>>>>>> materialized
>>>>>>>>>>>> vis
>>>>>>>>>>>>>>>>>>>>> overloaded API functions. While for the state store
>> names, I
>>>>>>>>>>> feel
>>>>>>>>>>>>>>>>>>>>> it is a
>>>>>>>>>>>>>>>>>>>>> different argument than serdes (details below).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> So to me, for either materialize() v.s. overloaded
>> functions
>>>>>>>>>>>>>>>>>>>>> directions,
>>>>>>>>>>>>>>>>>>>>> the first thing I'd like to resolve is the
>> inconsistency
>>>>>>>> issue
>>>>>>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>>>> above. So in either case: KTable materialization will
>> not be
>>>>>>>>>>>> affect
>>>>>>>>>>>>>>>>>>>>> by user
>>>>>>>>>>>>>>>>>>>>> providing state store name or not, but will only be
>> decided
>>>>>>>> by
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> library
>>>>>>>>>>>>>>>>>>>>> when it is necessary. More specifically, only join
>> operator
>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> builder.table() resulted KTables are not always
>> materialized,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> are still
>>>>>>>>>>>>>>>>>>>>> likely to be materialized lazily (e.g. when
>> participated in a
>>>>>>>>>>>> join
>>>>>>>>>>>>>>>>>>>>> operator).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> For overloaded functions that would mean:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> a) we have an overloaded function for ALL operators
>> that
>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>> result
>>>>>>>>>>>>>>>>>>>>> in a KTable, and allow it to be null (i.e. for the
>> function
>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> param it is null by default);
>>>>>>>>>>>>>>>>>>>>> b) null-state-store-name do not indicate that a KTable
>> would
>>>>>>>>>>>>>>>>>>>>> not be
>>>>>>>>>>>>>>>>>>>>> materialized, but that it will not be used for IQ at
>> all
>>>>>>>>>>>> (internal
>>>>>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>>>>>> store names will be generated when necessary).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> For materialize() that would mean:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> a) we will remove state store names from ALL operators
>> that
>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>> result in a KTable.
>>>>>>>>>>>>>>>>>>>>> b) KTables that not calling materialized do not
>> indicate that
>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> KTable
>>>>>>>>>>>>>>>>>>>>> would not be materialized, but that it will not be
>> used for
>>>>>>>> IQ
>>>>>>>>>>>> at all
>>>>>>>>>>>>>>>>>>>>> (internal state store names will be generated when
>>>>>>>> necessary).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Again, in either ways the API itself does not "hint"
>> about
>>>>>>>>>>>> anything
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> materializing a KTable or not at all; it is still
>> purely
>>>>>>>>>>>> determined
>>>>>>>>>>>>>>>>>>>>> by the
>>>>>>>>>>>>>>>>>>>>> library when parsing the DSL for now.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Following these thoughts, I feel that 1) we should
>> probably
>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>>>>> the name
>>>>>>>>>>>>>>>>>>>>> "materialize" since it may be misleading to users as
>> what
>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>> happened
>>>>>>>>>>>>>>>>>>>>> behind the scene, to e.g. Damian suggested
>>>>>>>>>>> "queryableStore(String
>>>>>>>>>>>>>>>>>>>>> storeName)",
>>>>>>>>>>>>>>>>>>>>> which returns a QueryableStateStore, and can replace
>> the
>>>>>>>>>>>>>>>>>>>>> `KafkaStreams.store` function; 2) comparing those two
>> options
>>>>>>>>>>>>>>>>>>>>> assuming we
>>>>>>>>>>>>>>>>>>>>> get rid of the misleading function name, I personally
>> favor
>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> adding more
>>>>>>>>>>>>>>>>>>>>> overloading functions as it keeps the API simpler.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Sat, Jan 28, 2017 at 2:32 PM, Jan Filipiak
>>>>>>>>>>>>>>>>>>>>> <Jan.Filipiak@trivago.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> thanks for your mail, felt like this can clarify some
>>>>>>>> things!
>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>>>>>> unfortunately split but as all branches close in on
>> what my
>>>>>>>>>>>>>>>>>>>>>> suggestion was
>>>>>>>>>>>>>>>>>>>>>> about Ill pick this to continue
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Of course only the table the user wants to query
>> would be
>>>>>>>>>>>>>>>>>>>>>> materialized.
>>>>>>>>>>>>>>>>>>>>>> (retrieving the queryhandle implies materialisation).
>> So In
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> example of
>>>>>>>>>>>>>>>>>>>>>> KTable::filter if you call
>>>>>>>>>>>>>>>>>>>>>> getIQHandle on both tables only the one source that
>> is there
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>> materialize and the QueryHandleabstraction would make
>> sure
>>>>>>>> it
>>>>>>>>>>>> gets
>>>>>>>>>>>>>>>>>>>>>> mapped
>>>>>>>>>>>>>>>>>>>>>> and filtered and what not uppon read as usual.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Of Course the Object you would retrieve would maybe
>> only
>>>>>>>> wrap
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> storeName / table unique identifier and a way to
>> access the
>>>>>>>>>>>> streams
>>>>>>>>>>>>>>>>>>>>>> instance and then basically uses the same mechanism
>> that is
>>>>>>>>>>>>>>>>>>>>>> currently used.
>>>>>>>>>>>>>>>>>>>>>> From my point of view this is the least confusing way
>> for
>>>>>>>> DSL
>>>>>>>>>>>>>>>>>>>>>> users. If
>>>>>>>>>>>>>>>>>>>>>> its to tricky to get a hand on the streams instance
>> one
>>>>>>>> could
>>>>>>>>>>>> ask
>>>>>>>>>>>>>>>>>>>>>> the user
>>>>>>>>>>>>>>>>>>>>>> to pass it in before executing queries, therefore
>> making
>>>>>>>> sure
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> streams
>>>>>>>>>>>>>>>>>>>>>> instance has been build.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The effort to implement this is indeed some orders of
>>>>>>>>>>> magnitude
>>>>>>>>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>>>>> than the overloaded materialized call. As long as I
>> could
>>>>>>>> help
>>>>>>>>>>>>>>>>>>>>>> getting a
>>>>>>>>>>>>>>>>>>>>>> different view I am happy.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Best Jan
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On 28.01.2017 09:36, Eno Thereska wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Jan,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I understand your concern. One implication of not
>> passing
>>>>>>>> any
>>>>>>>>>>>>>>>>>>>>>>> store name
>>>>>>>>>>>>>>>>>>>>>>> and just getting an IQ handle is that all KTables
>> would
>>>>>>>> need
>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>>>>>>>> materialised. Currently the store name (or proposed
>>>>>>>>>>>>>>>>>>>>>>> .materialize() call)
>>>>>>>>>>>>>>>>>>>>>>> act as hints on whether to materialise the KTable or
>> not.
>>>>>>>>>>>>>>>>>>>>>>> Materialising
>>>>>>>>>>>>>>>>>>>>>>> every KTable can be expensive, although there are
>> some
>>>>>>>> tricks
>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>> can play,
>>>>>>>>>>>>>>>>>>>>>>> e.g., have a virtual store rather than one backed by
>> a
>>>>>>>> Kafka
>>>>>>>>>>>> topic.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> However, even with the above, after getting an IQ
>> handle,
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> user would
>>>>>>>>>>>>>>>>>>>>>>> still need to use IQ APIs to query the state. As
>> such, we
>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>>> continue to be outside the original DSL so this
>> wouldn't
>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>> original concern.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> So I read this suggestion as simplifying the APIs by
>>>>>>>> removing
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>>>>>>> name, at the cost of having to materialise every
>> KTable.
>>>>>>>> It's
>>>>>>>>>>>>>>>>>>>>>>> definitely an
>>>>>>>>>>>>>>>>>>>>>>> option we'll consider as part of this KIP.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>> Eno
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On 28 Jan 2017, at 06:49, Jan Filipiak <
>>>>>>>>>>>> Jan.Filipiak@trivago.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi Exactly
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I know it works from the Processor API, but my
>> suggestion
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>> prevent
>>>>>>>>>>>>>>>>>>>>>>>> DSL users dealing with storenames what so ever.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> In general I am pro switching between DSL and
>> Processor
>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>> easily. (In
>>>>>>>>>>>>>>>>>>>>>>>> my Stream applications I do this a lot with
>> reflection and
>>>>>>>>>>>>>>>>>>>>>>>> instanciating
>>>>>>>>>>>>>>>>>>>>>>>> KTableImpl) Concerning this KIP all I say is that
>> there
>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>> be a DSL
>>>>>>>>>>>>>>>>>>>>>>>> concept of "I want to expose this __KTable__. This
>> can be
>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> Method like
>>>>>>>>>>>>>>>>>>>>>>>> KTable::retrieveIQHandle():InteractiveQueryHandle,
>> the
>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>> would know
>>>>>>>>>>>>>>>>>>>>>>>> to materialize, and the user had a reference to the
>> "store
>>>>>>>>>>>> and the
>>>>>>>>>>>>>>>>>>>>>>>> distributed query mechanism by the Interactive Query
>>>>>>>> Handle"
>>>>>>>>>>>>>>>>>>>>>>>> under the hood
>>>>>>>>>>>>>>>>>>>>>>>> it can use the same mechanism as the PIP people
>> again.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I hope you see my point J
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Best Jan
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> #DeathToIQMoreAndBetterConnectors :)
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On 27.01.2017 21:59, Matthias J. Sax wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Jan,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> the IQ feature is not limited to Streams DSL but
>> can also
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>> used for
>>>>>>>>>>>>>>>>>>>>>>>>> Stores used in PAPI. Thus, we need a mechanism
>> that does
>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>>>> for PAPI
>>>>>>>>>>>>>>>>>>>>>>>>> and DSL.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Nevertheless I see your point and I think we could
>>>>>>>> provide
>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> better API
>>>>>>>>>>>>>>>>>>>>>>>>> for KTable stores including the discovery of remote
>>>>>>>> shards
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>>>>>>>>>>> KTable.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> @Michael: Yes, right now we do have a lot of
>> overloads
>>>>>>>> and
>>>>>>>>>>> I
>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>>>>>>>>>>>> big fan of those -- I would rather prefer a builder
>>>>>>>>>>> pattern.
>>>>>>>>>>>>>>>>>>>>>>>>> But that
>>>>>>>>>>>>>>>>>>>>>>>>> might be a different discussion (nevertheless, if
>> we
>>>>>>>> would
>>>>>>>>>>>> aim
>>>>>>>>>>>>>>>>>>>>>>>>> for a API
>>>>>>>>>>>>>>>>>>>>>>>>> rework, we should get the changes with regard to
>> stores
>>>>>>>>>>> right
>>>>>>>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>>>>>>>> beginning on, in order to avoid a redesign later
>> on.)
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> stream.groupyByKey()
>>>>>>>>>>>>>>>>>>>>>>>>> .window(TimeWindow.of(5000))
>>>>>>>>>>>>>>>>>>>>>>>>> .aggregate(...)
>>>>>>>>>>>>>>>>>>>>>>>>> .withAggValueSerde(new CustomTypeSerde())
>>>>>>>>>>>>>>>>>>>>>>>>> .withStoreName("storeName);
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> (This would also reduce JavaDoc redundancy --
>> maybe a
>>>>>>>>>>>> personal
>>>>>>>>>>>>>>>>>>>>>>>>> pain
>>>>>>>>>>>>>>>>>>>>>>>>> point right now :))
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On 1/27/17 11:10 AM, Jan Filipiak wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Yeah,
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe my bad that I refuse to look into IQ as i
>> don't
>>>>>>>> find
>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>>>>>>>>>> anywhere
>>>>>>>>>>>>>>>>>>>>>>>>>> close to being interesting. The Problem IMO is
>> that
>>>>>>>> people
>>>>>>>>>>>>>>>>>>>>>>>>>> need to know
>>>>>>>>>>>>>>>>>>>>>>>>>> the Store name), so we are working on different
>> levels
>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> achieve a
>>>>>>>>>>>>>>>>>>>>>>>>>> single goal.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> What is your peoples opinion on having a method on
>>>>>>>> KTABLE
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>>>>> them something like a Keyvalue store. There is of
>> course
>>>>>>>>>>>>>>>>>>>>>>>>>> problems like
>>>>>>>>>>>>>>>>>>>>>>>>>> "it cant be used before the streamthreads are
>> going and
>>>>>>>>>>>>>>>>>>>>>>>>>> groupmembership
>>>>>>>>>>>>>>>>>>>>>>>>>> is established..." but the benefit would be that
>> for the
>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>> there is
>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>> consistent way of saying "Hey I need it
>> materialized as
>>>>>>>>>>>>>>>>>>>>>>>>>> querries gonna
>>>>>>>>>>>>>>>>>>>>>>>>>> be comming" + already get a Thing that he can
>> execute
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> querries on
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> 1 step.
>>>>>>>>>>>>>>>>>>>>>>>>>> What I think is unintuitive here is you need to
>> say
>>>>>>>>>>>>>>>>>>>>>>>>>> materialize on this
>>>>>>>>>>>>>>>>>>>>>>>>>> Ktable and then you go somewhere else and find
>> its store
>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>>>>>>>>> and then
>>>>>>>>>>>>>>>>>>>>>>>>>> you go to the kafkastreams instance and ask for
>> the
>>>>>>>> store
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>> name.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> So one could the user help to stay in DSL land and
>>>>>>>>>>> therefore
>>>>>>>>>>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>> confuse him less.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Best Jan
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> #DeathToIQMoreAndBetterConnectors :)
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On 27.01.2017 16:51, Damian Guy wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think Jan is saying that they don't always
>> need to be
>>>>>>>>>>>>>>>>>>>>>>>>>>> materialized,
>>>>>>>>>>>>>>>>>>>>>>>>>>> i.e.,
>>>>>>>>>>>>>>>>>>>>>>>>>>> filter just needs to apply the ValueGetter, it
>> doesn't
>>>>>>>>>>>> need yet
>>>>>>>>>>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>> physical state store.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, 27 Jan 2017 at 15:49 Michael Noll <
>>>>>>>>>>>> michael@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Like Damian, and for the same reasons, I am more
>> in
>>>>>>>> favor
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> overloading
>>>>>>>>>>>>>>>>>>>>>>>>>>>> methods rather than introducing `materialize()`.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> FWIW, we already have a similar API setup for
>> e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `KTable#through(topicName, stateStoreName)`.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related but slightly different question is
>> what e.g.
>>>>>>>>>>> Jan
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filipiak
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned earlier in this thread:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we need to explain more clearly why
>> KIP-114
>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> seemingly simpler solution of always
>> materializing
>>>>>>>>>>>> tables/state
>>>>>>>>>>>>>>>>>>>>>>>>>>>> stores.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 27, 2017 at 4:38 PM, Jan Filipiak <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jan.Filipiak@trivago.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yeah its confusing, Why shoudn't it be
>> querable by
>>>>>>>> IQ?
>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ValueGetter of Filter it will apply the filter
>> and
>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent as to if another processor or IQ is
>>>>>>>>>>> accessing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it? How
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new method help?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I cannot see the reason for the additional
>>>>>>>> materialize
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method being
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> required! Hence I suggest leave it alone.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> regarding removing the others I dont have
>> strong
>>>>>>>>>>> opinions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be unrelated.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Jan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 26.01.2017 20:48, Eno Thereska wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Forwarding this thread to the users list too
>> in case
>>>>>>>>>>>> people
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comment. It is also on the dev list.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Eno
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: "Matthias J. Sax" <
>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS] KIP-114: KTable
>>>>>>>>>>> materialization
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Date: 24 January 2017 at 19:30:10 GMT
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@kafka.apache.org
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Reply-To: dev@kafka.apache.org
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That not what I meant by "huge impact".
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I refer to the actions related to
>> materialize a
>>>>>>>>>>> KTable:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RocksDB store and a changelog topic -- users
>> should
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aware about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime implication and this is better
>> expressed by
>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explicit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call, rather than implicitly triggered by
>> using a
>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> overload of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a method.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 1/24/17 1:35 AM, Damian Guy wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think your definition of a huge impact and
>> mine
>>>>>>>> are
>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ;-P
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Overloading a few methods  is not really a
>> huge
>>>>>>>>>>> impact
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO. It is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sacrifice worth making for readability,
>> usability of
>>>>>>>>>>> the
>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, 23 Jan 2017 at 17:55 Matthias J.
>> Sax <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I understand your argument, but do not
>> agree with
>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your first version (even if the "flow" is
>> not as
>>>>>>>>>>>> nice)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explicit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than the second version. Adding a
>> stateStoreName
>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implicit but has a huge impact -- thus, I
>> prefer
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> verbose
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but explicit version.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 1/23/17 1:39 AM, Damian Guy wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not a fan of materialize. I think it
>>>>>>>> interrupts
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flow,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i.e,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> table.mapValue(..).materialize().join(..).materialize()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compared to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table.mapValues(..).join(..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I know which one i prefer.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> My preference is stil to provide
>> overloaded
>>>>>>>>>>> methods
>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specify the store names if they want,
>> otherwise
>>>>>>>> we
>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, 23 Jan 2017 at 05:30 Matthias J. Sax
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <matthias@confluent.io
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for the KIP Eno! Here are my 2
>> cents:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I like Guozhang's proposal about
>> removing
>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KTable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> methods and generate internal names
>> (however, I
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> overloads). Furthermore, I would not
>> force
>>>>>>>> users
>>>>>>>>>>>> to call
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .materialize()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if they want to query a store, but add
>> one more
>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .stateStoreName()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that returns the store name if the
>> KTable is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialized.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thus,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .materialize() must not necessarily have a
>> parameter
>>>>>>>>>>>> storeName
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ie,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should have some overloads here).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would also not allow to provide a
>> null store
>>>>>>>>>>>> name (to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> indicate no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization if not necessary) but
>> throw an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This yields some simplification (see
>> below).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) I also like Guozhang's proposal about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KStream#toTable()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. What will happen when you call
>> materialize
>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KTable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialized? Will it create another
>>>>>>>> StateStore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (providing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different), throw an Exception?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently an exception is thrown, but
>> see
>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we follow approach (1) from
>> Guozhang, there
>>>>>>>>>>> is
>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a second materialization and also no
>> exception
>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> throws. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .materialize() basically sets a
>> "materialized
>>>>>>>>>>>> flag" (ie,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idempotent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operation) and sets a new name.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Rename toStream() to toKStream() for
>>>>>>>> consistency.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Not sure whether that is really
>> required. We
>>>>>>>>>>> also
>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `KStreamBuilder#stream()` and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `KStreamBuilder#table()`, for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't care about the "K" prefix.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Eno's reply:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think changing it to `toKStream`
>> would make
>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> absolutely
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clear
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we are converting it to.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd say we should probably change the
>>>>>>>>>>>> KStreamBuilder
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> methods
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this KIP).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would keep #toStream(). (see below)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5) We should not remove any methods but
>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecate them.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A general note:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do not understand your comments
>> "Rejected
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alternatives". You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the KTable be the materialized view" was
>>>>>>>>>>> rejected.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does exactly this -- the changelog
>> abstraction
>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KTable is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after those changes and the "view" abstraction
>> is
>>>>>>>> what
>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KTable is.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just to be clear, I like this a lot:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - it aligns with the name KTable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - is aligns with stream-table-duality
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - it aligns with IQ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would say that a KTable is a "view
>>>>>>>> abstraction"
>>>>>>>>>>>> (as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optional).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 1/22/17 5:05 PM, Guozhang Wang wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Eno, I have a few
>> meta
>>>>>>>>>>> comments
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and a few
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detailed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. I like the materialize() function in
>>>>>>>> general,
>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how other KTable functions should be
>> updated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> -- Guozhang


Mime
View raw message