kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Levani Kokhreidze <levani.co...@gmail.com>
Subject Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams
Date Mon, 01 Jul 2019 08:04:42 GMT
Actually, giving it more though - maybe enhancing Produced with num of partitions configuration
is not the best approach. Let me explain why:

1) If we enhance Produced class with this configuration, this will also affect KStream#to
operation. Since KStream#to is the final sink of the topology, for me, it seems to be reasonable
assumption that user needs to manually create sink topic in advance. And in that case, having
num of partitions configuration doesn’t make much sense. 

2) Looking at Produced class, based on API contract, seems like Produced is designed to be
something that is explicitly for producer (key serializer, value serializer, partitioner those
all are producer specific configurations) and num of partitions is topic level configuration.
And I don’t think mixing topic and producer level configurations together in one class is
the good approach.

3) Looking at KStream interface, seems like Produced parameter is for operations that work
with non-internal (e.g topics created and managed internally by Kafka Streams) topics and
I think we should leave it as it is in that case.

Taking all this things into account, I think we should distinguish between DSL operations,
where Kafka Streams should create and manage internal topics (KStream#groupBy) vs topics that
should be created in advance (e.g KStream#to).

To sum it up, I think adding numPartitions configuration in Produced will result in mixing
topic and producer level configuration in one class and it’s gonna break existing API semantics.

Regarding making topic name optional in KStream#through - I think underline idea is very useful
and giving users possibility to specify num of partitions there is even more useful :) Considering
arguments against adding num of partitions in Produced class, I see two options here:
1) Add following method overloads
	* through() - topic will be auto-generated and num of partitions will be taken from source
topic
 	* through(final int numOfPartitions) - topic will be auto generated with specified num of
partitions
	* through(final int numOfPartitions, final Produced<K, V> produced) - topic will be
with generated with specified num of partitions and configuration taken from produced parameter.
2) Leave KStream#through as it is and introduce new method - KStream#repartition (I think
Matthias suggested this in one of the threads)

Considering all mentioned above I propose the following plan:

Option A:
1) Leave Produced as it is
2) Add num of partitions configuration to Grouped class (as mentioned in the KIP)
3) Add following method overloads to KStream#through
	* through() - topic will be auto-generated and num of partitions will be taken from source
topic
 	* through(final int numOfPartitions) - topic will be auto generated with specified num of
partitions
	* through(final int numOfPartitions, final Produced<K, V> produced) - topic will be
with generated with specified num of partitions and configuration taken from produced parameter.

Option B:
1) Leave Produced as it is
2) Add num of partitions configuration to Grouped class (as mentioned in the KIP)
3) Add new operator KStream#repartition for creating and managing internal repartition topics

P.S. I’m sorry if all of this was already discussed in the mailing list, but I kinda got
with all the threads that were about this KIP :(

Kind regards,
Levani

> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <levani.codes@gmail.com> wrote:
> 
> Hello,
> 
> I would like to resurrect discussion around KIP-221. Going through the discussion thread,
there’s seems to agreement around usefulness of this feature. 
> Regarding the implementation, as far as I understood, the most optimal solution for me
seems the following:
> 
> 1) Add two method overloads to KStream#through method (essentially making topic name
optional)
> 2) Enhance Produced class with numOfPartitions configuration field.
> 
> Those two changes will allow DSL users to control parallelism and trigger re-partition
without doing stateful operations.
> 
> I will update KIP with interface changes around KStream#through if this changes sound
sensible.
> 
> Kind regards,
> Levani


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message