kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McCaig, Rhys" <Rhys_McC...@comcast.com>
Subject Re: [EXTERNAL] [DISCUSS] KIP-310: Add a Kafka Source Connector to Kafka Connect
Date Wed, 26 Sep 2018 21:01:57 GMT
Hi Konstantine,

Thank you for your thoughtful comments!

> However, I don't think the apache/kafka repository is the right place to
> host such a Connector. 

<snip>
> I find this approach very appealing. AK focuses on providing the core
> infrastructure for Connect, that is required in every Kafka Connect
> deployment, as well as offering the means to generically install, deploy
> and operate connectors.

I personally flip-flopped on this with similar thoughts with this when I initially considered
raising a KIP for this functionality. 

When I initially developed a Kafka source connector, this was out of necessity - MirrorMaker
requires zkconnect strings, which I didn't have access to for the source cluster, and Confluent’s
proprietary connector also requried zk connections - though it has now been updated to remove
this limitation. 

While I understand the point of view that MirrorMaker refers to the early days of Apache Kafka,
it has become a critical tool for replicating data across Kafka clusters in for a large portion
of the community who are managing Kafka at scale. As such, I suspect that there is a lot of
interest in the Kafka project supporting topic replication across clusters. While one approach
(which I don’t have the knowledge or time to address) could be to include it as a core component
of Kafka itself (such as Apache Pulsar’s global topics), my view is that at this point in
time, Kafka Connect is considered *the* way to ship data in and our of a specific Kafka cluster,
regardless of the external system. 

I’d welcome further discussion on whether the community thinks is the right approach for
the Kafka project to take, in regards to handling Kafka topic mirroring. I *think* that its
important and common enough, that there should be support in the project - and MirrorMaker
is, as you mention, showing its age. 

Cheers,
Rhys




> On Sep 26, 2018, at 10:42 AM, Konstantine Karantasis <konstantine@confluent.io>
wrote:
> 
> Hi Rhys,
> 
> thanks for the proposal and apologies for the late feedback. Utilizing
> Connect to mirror Kafka topics is definitely a plausible proposal for a
> very useful use case.
> 
> However, I don't think the apache/kafka repository is the right place to
> host such a Connector. Currently, no full-featured, production-ready
> connectors are hosted in AK. The only two connectors shipped with AK
> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> demonstrate implementations only as examples.
> 
> I find this approach very appealing. AK focuses on providing the core
> infrastructure for Connect, that is required in every Kafka Connect
> deployment, as well as offering the means to generically install, deploy
> and operate connectors. But all the connectors reside outside AK and
> comprise a vibrant ecosystem of open source and proprietary components
> that, essentially - even for the most useful and ubiquitous of the
> connectors - are optional for users to install and use. This seems simple
> and flexible, both in terms of releasing and using/deploying software
> related to Kafka Connect. I might even say that I'd be in favor of
> extending this approach to all the Connect components, including
> Transformations and Converters.
> 
> I'm aware that MirrorMaker is part of AK, but to me this refers to the
> early days of Apache Kafka, when the size of the project and the ecosystem
> was smaller, Connect and Streams had not been implemented yet, and
> mirroring topics between Kafka clusters was already a basic need. With a
> much more rich ecosystem now and more sizable and well defined packages in
> AK, I think the approach that decouples connectors from the Connect
> framework itself is a good one.
> 
> In my opinion, the fact that this connector targets Kafka itself as a
> source is not an adequate reason to include it in apache/kafka within the
> Connect framework. It seems it can evolve naturally, as every other
> connector, in its own repository.
> 
> Regards,
> Konstantine
> 
> 
> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rhys_McCaig@comcast.com> wrote:
> 
>> Hi All,
>> 
>> If there are no further comments on this KIP I’ll start a vote early this
>> week.
>> 
>> Rhys
>> 
>> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>> <mailto:Rhys_McCaig@cable.comcast.com>> wrote:
>> 
>> Hi All,
>> 
>> I’ve updated the proposal to include the improvements suggested by
>> Stephane.
>> 
>> I have also submitted a PR to implement this functionality into Kafka.
>> https://github.com/apache/kafka/pull/5438
>> 
>> I don’t have a benchmark against MirrorMaker yet, as I only currently have
>> a local docker stack available to me, though I have seen very good
>> performance in that test stack (200k messages/sec@100bytes on limited
>> compute resource containers). Further benchmarking might take a few days.
>> 
>> Review and comments would be appreciated.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>> <mailto:Rhys_McCaig@cable.comcast.com>> wrote:
>> 
>> Hi Stephane,
>> 
>> Thanks for your feedback and apologies for the delay in my response.
>> 
>> Are there any performance benchmarks against Mirror Maker available? I'm
>> interested to know if this is more performant / scalable.
>> Regarding the implementation, here's some feedback:
>> 
>> 
>> Currently I don’t have any performance benchmarks, but I think this is a
>> great idea, ill see if I can set up something one the next week or so.
>> 
>> - I think it's worth mentioning that this solution does not rely on
>> consumer groups, and therefore tracking progress may be tricky. Can you
>> think of a way to expose that?
>> 
>> This is a reasonable concern. I’m not sure how to track this other than
>> looking at the Kafka connect offsets. Once a messages is passed to the
>> framework, I'm unaware of a way to get at the commit offsets on the
>> producer side. Any thoughts?
>> 
>> - Some code can be in config Validator I believe:
>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>> 
>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>> don't see it reflected yet in the code
>> 
>> - Is there a way to be flexible and merge list and regex, or offer the two
>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>> 
>> Agree on all of the above - I will incorporate into the code later this
>> week as ill get some time back to work on this.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> 
>> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
>> stephane@simplemachines.com.au<mailto:stephane@simplemachines.com.au>>
>> wrote:
>> 
>> Hi Rhys,
>> 
>> I think this will be a great addition.
>> 
>> Are there any performance benchmarks against Mirror Maker available? I'm
>> interested to know if this is more performant / scalable.
>> Regarding the implementation, here's some feedback:
>> 
>> - I think it's worth mentioning that this solution does not rely on
>> consumer groups, and therefore tracking progress may be tricky. Can you
>> think of a way to expose that?
>> 
>> 
>> - Some code can be in config Validator I believe:
>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>> 
>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>> don't see it reflected yet in the code
>> 
>> - Is there a way to be flexible and merge list and regex, or offer the two
>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>> 
>> Hope that helps
>> Stephane
>> 
>> Kind regards,
>> Stephane
>> 
>> [image: Simple Machines]
>> 
>> Stephane Maarek | Developer
>> 
>> +61 416 575 980
>> stephane@simplemachines.com.au<mailto:stephane@simplemachines.com.au>
>> simplemachines.com.au<http://simplemachines.com.au>
>> Level 2, 145 William Street, Sydney NSW 2010
>> 
>> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
>> Rhys_McCaig@comcast.com>> wrote:
>> 
>> Hi All,
>> 
>> As I didn’t get any comment on this KIP and there has since been an
>> additional 2 KIP’s created numbered 308 since, I'm bumping this and
>> renaming the KIP to 310 to remove the duplication:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>> 
>> Let me know if you have any comments or feedback, would love to hear them.
>> 
>> Cheers,
>> Rhys
>> 
>> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
>> <mailto:rhys_mccaig@comcast.com>>
>> wrote:
>> 
>> Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
>> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+>
>> Connector+to+Kafka+Connect
>> 
>> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
>> <mailto:Rhys_McCaig@comcast.com>>
>> wrote:
>> 
>> Hi All,
>> 
>> I added a KIP to include a Kafka Source Connector with Kafka Connect.
>> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>> <htt
>> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>> 
>> Looking forward to your feedback and suggestions.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

Mime
View raw message