kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines
Date Wed, 23 Aug 2017 19:38:15 GMT
About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
convention consistent? And if we change the naming schema just change
all at once? I personally don't care which naming scheme is better, but
I think consistency is super important!

About Bill's comment: I agree, and had a similar thought.


-Matthias

On 8/23/17 12:24 PM, Bill Bejeck wrote:
> Thanks for all the work on this KIP Damian.
> 
> Both `Produced` and `Joined` have a `with` method accepting all parameters,
> but `Consumed` doesn't. Should we add one for consistency?
> 
> Thanks,
> Bill
> 
> On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy <damian.guy@gmail.com> wrote:
> 
>> KIP has been updated. thanks
>>
>> On Wed, 23 Aug 2017 at 09:10 Damian Guy <damian.guy@gmail.com> wrote:
>>
>>> Hi Matthias,
>>>
>>>
>>>> KStream:
>>>> leftJoin and outerJoin for KStream/KTable join should not have
>>>> `JoinWindows` parameter
>>>>
>>>> Thanks!
>>>
>>>
>>>>
>>>> Nit: TopologyBuilder -> Topology
>>>>
>>>> Ack
>>>
>>>
>>>> Nit: new class Serialized list static method #with twice
>>>>
>>>> Ack
>>>
>>>
>>>> WindowedKStream -> for consistency we should either have GroupedKStream
>>>> or KWindowedStream... (similar argument for SessionWindowedKStream)
>>>>
>>>> We can't rename KGroupedStream -> GroupedKStream without breaking
>>> compatibility. So we are stuck with it for now. Hopefully in the future
>> we
>>> can rename KGroupedStream to GroupedKStream.
>>>
>>>
>>>>
>>>> KGroupedStream
>>>> -> why do we use a different name for `sessionWindowedBy()` -- seems to
>>>> be cleaner to call both methods `windowedBy()`
>>>>
>>>>
>>> I beg to differ that it is cleaner either way!
>>>
>>>
>>>>
>>>> StreamsBuilder#stream -> parameter order is confusing... We should have
>>>> Pattern as second parameter to align both methods.
>>>>
>>>> Ack
>>>
>>>
>>>> StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
>>>> parameter to align with `#stream`
>>>>
>>>>
>>>> Ack
>>>
>>>> Produced#with(Serde, Serde)
>>>> Produced#with(StreamPartitioner, Serde, Serde)
>>>> -> should StreamPartitioner be the third argument instead of the first?
>>>>
>>>> Sure
>>>
>>>>
>>>> Consumed:
>>>> Why do we need 3 different names for the 3 static methods? I would all
>>>> of them just call `with()`. Current names sound clumsy to me. And a
>>>> plain `with()` also aligns with the naming of static methods of other
>>>> classes.
>>>>
>>>
>>> I disagree that the names sound clumsy! But yes they should be aligned
>>> with the others.
>>>
>>>
>>>>
>>>>
>>>> I guess we are also deprecation a bunch of method for
>>>> KStream/KTable/KGroupedStream/KGroupedTable and should mention which
>>>> one? There is just one sentence "Deprecate the existing overloads.", but
>>>> we don't deprecate all existing once. I personally don't care to much if
>>>> we spell deprecated method out explicitly, but right now it's not
>>>> consistent as we only list methods we add.
>>>>
>>>>
>>>
>>>> Should we deprecate `StateStoreSupplier`?
>>>>
>>> Yep
>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 8/22/17 6:55 AM, Damian Guy wrote:
>>>>> I've just updated the KIP with some additional changes targeted at
>>>>> StreamsBuilder
>>>>>
>>>>> Thanks,
>>>>> Damian
>>>>>
>>>>> On Thu, 10 Aug 2017 at 12:59 Damian Guy <damian.guy@gmail.com>
wrote:
>>>>>
>>>>>>
>>>>>>> Got it, thanks.
>>>>>>>
>>>>>>> Does it still make sense to have one static constructors for
each
>>>> spec,
>>>>>>> with one constructor having only one parameter to make it more
>> usable,
>>>>>>> i.e.
>>>>>>> as a user I do not need to give all parameters if I only want
to
>>>> override
>>>>>>> one of them? Maybe we can just name the constructors as `with`
but
>>>> I'm not
>>>>>>> sure if Java distinguish:
>>>>>>>
>>>>>>> public static <K, V> Produced<K, V> with(final Serde<K>
keySerde)
>>>>>>> public static <K, V> Produced<K, V> with(final Serde<V>
valueSerde)
>>>>>>>
>>>>>>> as two function signatures.
>>>>>>>
>>>>>>>
>>>>>> No that won't work. That is why we have all options, i.e., on Produce
>>>>>> public static <K, V> Produced<K, V> with(final Serde<K>
keySerde,
>>>> final Serde<V>
>>>>>> valueSerde)
>>>>>> public static <K, V> Produced<K, V> with(final StreamPartitioner<K,
>> V>
>>>>>> partitioner, final Serde<K> keySerde, final Serde<V>
valueSerde)
>>>>>> public static <K, V> Produced<K, V> keySerde(final Serde<K>
keySerde)
>>>>>> public static <K, V> Produced<K, V> valueSerde(final
Serde<V>
>>>> valueSerde)
>>>>>> public static <K, V> Produced<K, V> streamPartitioner(final
>>>> StreamPartitioner<K,
>>>>>> V> partitioner)
>>>>>>
>>>>>> So if you only want to use one you can just use the function that
>> takes
>>>>>> one argument.
>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy <damian.guy@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> On Tue, 8 Aug 2017 at 20:11 Guozhang Wang <wangguoz@gmail.com>
>>>> wrote:
>>>>>>>>
>>>>>>>>> Damian,
>>>>>>>>>
>>>>>>>>> Thanks for the proposal, I had a few comments on the
APIs:
>>>>>>>>>
>>>>>>>>> 1. Printed#withFile seems not needed, as users should
always spec
>> if
>>>>>>> it
>>>>>>>> is
>>>>>>>>> to sysOut or to File at the beginning. In addition as
a second
>>>>>>> thought, I
>>>>>>>>> think serdes are not useful for prints anyways since
we assume
>>>>>>> `toString`
>>>>>>>>> is provided except for byte arrays, in which we will
special
>> handle
>>>>>>> it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>>
>>>>>>>>> Another comment about Printed in general is it differs
with other
>>>>>>> options
>>>>>>>>> that it is a required option than optional one, since
it includes
>>>>>>>> toSysOut
>>>>>>>>> / toFile specs; what are the pros and cons for including
these two
>>>> in
>>>>>>> the
>>>>>>>>> option and hence make it a required option than leaving
them at
>> the
>>>>>>> API
>>>>>>>>> layer and make Printed as optional for mapper / label
only?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> It isn't required as we will still have the no-arg print()
which
>> will
>>>>>>> just
>>>>>>>> go to sysout as it does now.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.1 KStream#through / to
>>>>>>>>>
>>>>>>>>> We should have an overloaded function without Produced?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes - we already have those so they are not part of the KIP,
i.e,
>>>>>>>> through(topic)
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.2 KStream#groupBy / groupByKey
>>>>>>>>>
>>>>>>>>> We should have an overloaded function without Serialized?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, as above
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.3 KGroupedStream#count / reduce / aggregate
>>>>>>>>>
>>>>>>>>> We should have an overloaded function without Materialized?
>>>>>>>>>
>>>>>>>>
>>>>>>>> As above
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.4 KStream#join
>>>>>>>>>
>>>>>>>>> We should have an overloaded function without Joined?
>>>>>>>>>
>>>>>>>>
>>>>>>>> as above
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.5 Each of KTable's operators:
>>>>>>>>>
>>>>>>>>> We should have an overloaded function without Produced
/
>> Serialized
>>>> /
>>>>>>>>> Materialized?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> as above
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3.1 Produced: the static functions have overlaps, which
seems not
>>>>>>>>> necessary. I'd suggest jut having the following three
static with
>>>>>>> another
>>>>>>>>> three similar member functions:
>>>>>>>>>
>>>>>>>>> public static <K, V> Produced<K, V> withKeySerde(final
Serde<K>
>>>>>>> keySerde)
>>>>>>>>>
>>>>>>>>> public static <K, V> Produced<K, V> withValueSerde(final
Serde<V>
>>>>>>>>> valueSerde)
>>>>>>>>>
>>>>>>>>> public static <K, V> Produced<K, V> withStreamPartitioner(final
>>>>>>>>> StreamPartitioner<K, V> partitioner)
>>>>>>>>>
>>>>>>>>> The key idea is that by using the same function name
string for
>>>> static
>>>>>>>>> constructor and member functions, users do not need to
remember
>> what
>>>>>>> are
>>>>>>>>> the differences but can call these functions with any
ordering
>> they
>>>>>>> want,
>>>>>>>>> and later calls on the same spec will win over early
calls.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> That would be great if java supported it, but it doesn't.
You can't
>>>> have
>>>>>>>> static an member functions with the same signature.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3.2 Serialized: similarly
>>>>>>>>>
>>>>>>>>> public static <K, V> Serialized<K, V> withKeySerde(final
Serde<K>
>>>>>>>> keySerde)
>>>>>>>>>
>>>>>>>>> public static <K, V> Serialized<K, V> withValueSerde(final
>> Serde<V>
>>>>>>>>> valueSerde)
>>>>>>>>>
>>>>>>>>> public Serialized<K, V> withKeySerde(final Serde<K>
keySerde)
>>>>>>>>>
>>>>>>>>> public Serialized<K, V> withValueSerde(final Serde
valueSerde)
>>>>>>>>>
>>>>>>>>
>>>>>>>> as above
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also it has a final Serde<V> otherValueSerde in
one of its static
>>>>>>>>> constructor, it that intentional?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Nope: thanks.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3.3. Joined: similarly, keep the static constructor signatures
the
>>>>>>> same
>>>>>>>> as
>>>>>>>>> its corresponding member fields.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> As above
>>>>>>>>
>>>>>>>>
>>>>>>>>> 3.4 Materialized: it is a bit special, and I think we
can keep its
>>>>>>> static
>>>>>>>>> constructors with only two `as` as they are today.K
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 4. Is there any modifications on StateStoreSupplier? Is it
replaced
>>>> by
>>>>>>>>> BytesStoreSupplier? Seems some more descriptions are
lacking here.
>>>>>>> Also
>>>>>>>> in
>>>>>>>>>
>>>>>>>>>
>>>>>>>> No modifications to StateStoreSupplier. It is superseceded
by
>>>>>>>> BytesStoreSupplier.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> public static <K, V, S extends StateStore> Materialized<K,
V, S>
>>>>>>>>> as(final StateStoreSupplier<S>
>>>>>>>>> supplier)
>>>>>>>>>
>>>>>>>>> Is the parameter in type of BytesStoreSupplier?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yep - thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 27, 2017 at 5:26 AM, Damian Guy <damian.guy@gmail.com
>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Updated link:
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>> 182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+
>>>>>>>>>> use+of+custom+storage+engines
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Damian
>>>>>>>>>>
>>>>>>>>>> On Thu, 27 Jul 2017 at 13:09 Damian Guy <damian.guy@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've put together a KIP to make some changes
to the KafkaStreams
>>>>>>> DSL
>>>>>>>>> that
>>>>>>>>>>> will hopefully allow us to:
>>>>>>>>>>> 1) reduce the explosion of overloads
>>>>>>>>>>> 2) add new features without having to continue
adding more
>>>>>>> overloads
>>>>>>>>>>> 3) provide simpler ways for people to use custom
storage engines
>>>>>>> and
>>>>>>>>> wrap
>>>>>>>>>>> them with logging, caching etc if desired
>>>>>>>>>>> 4) enable per-operator caching rather than global
caching
>> without
>>>>>>>>> having
>>>>>>>>>>> to resort to supplying a StateStoreSupplier when
you just want
>> to
>>>>>>>> turn
>>>>>>>>>>> caching off.
>>>>>>>>>>>
>>>>>>>>>>> The KIP is here:
>>>>>>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.
>>>>>>>>>> action?pageId=73631309
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Damian
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -- Guozhang
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -- Guozhang
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
> 


Mime
View raw message