From dev-return-110530-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Jan 15 22:41:13 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id CA4BB18065E for ; Wed, 15 Jan 2020 23:41:12 +0100 (CET) Received: (qmail 60000 invoked by uid 500); 15 Jan 2020 22:41:11 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 59988 invoked by uid 99); 15 Jan 2020 22:41:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2020 22:41:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D24D51805F1 for ; Wed, 15 Jan 2020 22:41:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 3iYwItRiv6Ps for ; Wed, 15 Jan 2020 22:41:07 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.169; helo=mail-qk1-f169.google.com; envelope-from=rhauch@gmail.com; receiver= Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 70587BC5C3 for ; Wed, 15 Jan 2020 22:41:07 +0000 (UTC) Received: by mail-qk1-f169.google.com with SMTP id j9so17388829qkk.1 for ; Wed, 15 Jan 2020 14:41:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=M62QzHzNoCWtvAXnS3A7Nc2zmcNicKxqfOYXN0D6asQ=; b=DEG0D1dyvX3msXbt8rKdNOKHw+3nej7HA4xM3JUvtzPYraFZWb9EYp6tVNfrQEFJrI 01dYANMLoIukJUhWio7QiHghWKjG/bif3VWA7CNc9qyjlS7w/+ZnXngW4RzZVf5g5/9G Ze6D73HdGp7VabaUKZXqt2ccEWuuyMp/82PKHRfbrPfyN+jxZq6k+5DZo/59BtZW0y50 Acbgzp3gqEDcbzAlKuGDFfumIvTFwahdxW+TTPmKmFhtKCRimV6yLtTFjwjDXhiN4pVO xHy82ZvKCFblAGkluQ3CiVSnytR2Rpc0PCw2zX//B6MOKEFgL48W0zrEpXFM5PD0ciZ0 OCqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=M62QzHzNoCWtvAXnS3A7Nc2zmcNicKxqfOYXN0D6asQ=; b=ExTTdXGuMg/dfUZ1dP81LAHFjfJvNWcRBpWHX4j3HOWgUyMuDaI8ycmkFqwGvmrlaG aUncQ83xW/lG7W4IPBuYJ2kIBopUXipAqpWiMsd3/x3+O5RoSqSmU5XTqpyaqJKX6yp+ h6yY422b+4DDrvkPDoYHkXk41jltlHf4Sc7vkF48D5sQ8Z+XDlS9Y3AQZ617OJkfbJmy y/vIJ0fGwRlp3gTdqUNPDPqxwFR/ysem+1ssvwAuC70r3Qt0H8+w58Y/M6IYFZNFVRgf kGqy03lW2BiCuX33qCwP45qCyR3a1nHUhD3/DZYdQoSTDRg7vZKweXHOroGWumZEiksA Km8w== X-Gm-Message-State: APjAAAVeLHDCzsKtaL5hTBvV2sgFHY51aZTgfxvwBg5W0aQoxaogPUKZ Qcz4ImO3lZTbrVwmRhSZtCpK9Qg4wrx/VlJ6J0UPqdm5 X-Google-Smtp-Source: APXvYqz9hIJQFeoLrqbrjVXLmoJePJeV88qGevaRGF4szz53zc+ODxh941Drd4UMWtDmrm9JzPUYWUTsZYVWZzkhY+k= X-Received: by 2002:a05:620a:1467:: with SMTP id j7mr28117388qkl.76.1579128066849; Wed, 15 Jan 2020 14:41:06 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Randall Hauch Date: Wed, 15 Jan 2020 16:40:55 -0600 Message-ID: Subject: Re: [DISCUSS] KIP-558: Track the set of actively used topics by connectors in Kafka Connect To: dev Content-Type: multipart/alternative; boundary="0000000000007ac077059c3567cc" --0000000000007ac077059c3567cc Content-Type: text/plain; charset="UTF-8" On Wed, Jan 15, 2020 at 4:36 PM Randall Hauch wrote: > On Wed, Jan 15, 2020 at 2:05 PM Konstantine Karantasis < > konstantine@confluent.io> wrote: > >> >> 9. I assumed that partitioning is implied by default, because there's no >> requirement for complete ordering of topic status records. But I'll add >> this fact as a separate bullet. The status.storage.topic is already a >> partitioned topic. >> > > Agreed. I think it'd be sufficient to simply mention that partition will > be chosen based upon the active topic records' keys, ensuring that all > active topic records for the same connector will be written to the same > partition and will be totally ordered. > Well, my previous statement is not quite right. All topic records for the same *connector and topic* will be written to the same partition and will be totally ordered. But as you pointed out, it doesn't really matter, other than that this feature will work with any # of partitions. The new bullet you described would be sufficient. :-D > >> >> I'm following up with the rest of the comments, shortly. >> Thanks, >> Konstantine >> >> >> On Wed, Jan 15, 2020 at 9:19 AM Almog Gavra wrote: >> >> > Hi Konstantine, >> > >> > Thanks for the KIP! This is going to make automatic integration with >> > Connect much more powerful. >> > >> > My thoughts are mostly around freshness of the data and being able to >> > expose that to users. Riffing on Randall's timestamp question - have we >> > considered adding some interval at which point a connector will >> republish >> > any topics that it encounters and update the timestamp? That way we have >> > some refreshing mechanism that isn't as powerful as the complete reset >> > (which may not be practical in many scenarios). >> > >> > I also agree with Randall's other point (Would it be better to not >> > automatically reset connector's active topics when a sink connector is >> > restarted?). I think keeping the behavior as symmetrical between sink >> and >> > source connectors is a good idea. >> > >> > Lastly, with regards to the API, I can imagine it is also pretty useful >> to >> > answer the inverse question: "which connectors write to topic X". >> Perhaps >> > we can achieve this by letting the users compute it and just expose an >> API >> > that returns the entire mapping at once (instead of needing to call the >> > /connectors/{name}/topics endpoint for each connector). >> > >> > Otherwise, looks good to me! Hits the requirements that I had in mind on >> > the nose. >> > - Almog >> > >> > On Wed, Jan 15, 2020 at 1:14 AM Tom Bentley >> wrote: >> > >> > > Hi Konstantine, >> > > >> > > Thanks for the KIP, I can see how it could be useful. >> > > >> > > a) Did you consider using a metric for this? I don't think it would >> > satisfy >> > > all the use cases you have in mind, but you could mention it in the >> > > rejected alternatives. >> > > >> > > b) If the topic name contains the string "-connector" then the key >> format >> > > is ambiguous. This isn't necessarily fatal because the value will >> > > disambiguate, but it could be misleading. Any reason not to just use a >> > JSON >> > > key, and simplify the value? >> > > >> > > c) I didn't understand this part: "As soon as a worker detects the >> > addition >> > > of a topic to a connector's set of active topics, the worker will >> cease >> > to >> > > post update messages to the status.storage.topic for that connector. >> ". >> > I'm >> > > sure I've overlooking something but why is this necessary? Is this >> were >> > the >> > > task id in the value is used? >> > > >> > > Thanks again, >> > > >> > > Tom >> > > >> > > On Wed, Jan 15, 2020 at 12:15 AM Randall Hauch >> wrote: >> > > >> > > > Oh, one more thing: >> > > > >> > > > 9. There's no mention of how the status topic is partitioned, or how >> > > > partitioning will be used by the new topic records. The KIP should >> > > probably >> > > > outline this for clarity and completeness. >> > > > >> > > > Best regards, >> > > > >> > > > Randall >> > > > >> > > > On Tue, Jan 14, 2020 at 5:25 PM Randall Hauch >> > wrote: >> > > > >> > > > > Thanks, Konstantine. Overall, this KIP looks interesting and >> really >> > > > > useful, and for the most part is spot on. I do have a number of >> > > > > questions/comments about specifics: >> > > > > >> > > > > 1. The topic records have a value that includes the connector >> > name, >> > > > > task number that last reported the topic is used, and the topic >> > > name. >> > > > > There's no mention of record timestamps, but I wonder if it'd >> be >> > > > useful to >> > > > > record this. One challenge might be that a connector does not >> > write >> > > > to a >> > > > > topic for a while or the task remains running for long periods >> of >> > > > time and >> > > > > therefore the worker doesn't record that this topic has been >> newly >> > > > written >> > > > > to since it the task was restarted. IOW, the semantics of the >> > > > timestamp may >> > > > > be a bit murky. Have you thought about recording the timestamp, >> > and >> > > > if so >> > > > > what are the pros and cons? >> > > > > - The "Recording active topics" section says the following: >> > > > > "As soon as a worker detects the addition of a topic to a >> > > > > connector's set of active topics, all the connector's tasks >> > that >> > > > inspect >> > > > > source or sink records will cease to post update messages to >> > the >> > > > > status.storage.topic." >> > > > > This probably means the timestamp won't be very useful. >> > > > > 2. The KIP says "the Kafka record value stores the ID of the >> task >> > > that >> > > > > succeeded to store a topic status record last." However, this >> is a >> > > bit >> > > > > unclear: is it really storing the last task that successfully >> > wrote >> > > > to that >> > > > > topic (as this would require very frequent writes to this >> topic), >> > or >> > > > is it >> > > > > more that this is the task that was last *recorded* as having >> > > written >> > > > > to the topic? (Here, "recorded" could be a bit of a gray area, >> > since >> > > > this >> > > > > would depend on the how the worker periodically records this >> > > > information.) >> > > > > Any kind of clarity here might be helpful. >> > > > > 3. In the "Recording active topics" section (and the >> surrounding >> > > > > sections), the "task" is used ambiguously. For example, "when >> its >> > > > tasks >> > > > > start processing their first records ... these tasks will start >> > > > inspecting >> > > > > which is the Kafka topic of each of these records". IIUC, the >> > first >> > > > "task" >> > > > > mentioned is the connector's task, and the second is the >> worker's >> > > > task. Do >> > > > > we need to distinguish this more clearly? >> > > > > 4. Maybe I missed it, but does this KIP explicitly say that the >> > > > > Connector API is unchanged? It's probably worth pointing out to >> > help >> > > > > assuage any concerns that connector implementations have to >> change >> > > to >> > > > make >> > > > > use of this feature. >> > > > > 5. In the "Resetting a connector's set of active topics" >> section >> > the >> > > > > behavior is not exactly clear. Consider a user running >> connector >> > > "A", >> > > > the >> > > > > connector has been fully started and is processing records, and >> > the >> > > > worker >> > > > > has recorded topic usage records. Then the user resets the >> active >> > > > topics >> > > > > for connector A while the connector is still running? If the >> > > connector >> > > > > writes to no new topics, before the tasks are rebalanced then >> is >> > it >> > > > correct >> > > > > that Connect would report no active topics? And after the tasks >> > are >> > > > > rebalance, will the worker record any topics used by connector >> A? >> > > > > 6. In the "Restaring" (misspelled) section: "Reconfiguring a >> > source >> > > > > connector has also no altering effect for a source connector. >> > > > However, when >> > > > > reconfiguring a sink connector if the new configuration no >> longer >> > > > includes >> > > > > any of the previously tracked topics, these topics will be >> removed >> > > > from the >> > > > > set of active topics for this sink connector by appending >> > tombstone >> > > > > messages appropriately after the reconfiguration of the >> > connector." >> > > > Would >> > > > > it be better to not automatically reset connector's active >> topics >> > > > when a >> > > > > sink connector is restarted? Isn't that more consistent with >> the >> > > > > "Resetting" behavior and the goals at the top of the KIP: >> "it'd be >> > > > useful >> > > > > for users, operators and applications to know which are the >> topics >> > > > that a >> > > > > connector has used since it was first created"? >> > > > > 7. The `PUT /connectors/{name}/topics/reset` endpoint "this >> > request >> > > > > can be reapplied after the deletion of the connector". IOW, >> even >> > > > though >> > > > > connector with that name doesn't exist, we can still make this >> > > > request? How >> > > > > does this compare with other methods such as "status"? >> > > > > 8. What are the security implications of this proposal? >> > > > > >> > > > > As you can see, most of these can probably be addressed without >> much >> > > > work. >> > > > > >> > > > > Best regards, >> > > > > >> > > > > Randall >> > > > > >> > > > > On Mon, Jan 13, 2020 at 11:05 PM Konstantine Karantasis < >> > > > > konstantine@confluent.io> wrote: >> > > > > >> > > > >> Hi all. >> > > > >> >> > > > >> I just posted KIP-558: Track the set of actively used topics by >> > > > connectors >> > > > >> in Kafka Connect >> > > > >> >> > > > >> Wiki link here: >> > > > >> >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-558%3A+Track+the+set+of+actively+used+topics+by+connectors+in+Kafka+Connect >> > > > >> >> > > > >> I think it's a nice extension to follow up on KIP-158 and a >> useful >> > > > feature >> > > > >> to the ever increasing number of applications that are built >> around >> > > > Kafka >> > > > >> Connect. >> > > > >> Would love to hear what you think. >> > > > >> >> > > > >> Best, >> > > > >> Konstantine >> > > > >> >> > > > > >> > > > >> > > >> > >> > --0000000000007ac077059c3567cc--