From dev-return-96018-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Sat Jul 14 03:02:25 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 779A2180626 for ; Sat, 14 Jul 2018 03:02:24 +0200 (CEST) Received: (qmail 41628 invoked by uid 500); 14 Jul 2018 01:02:18 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 41617 invoked by uid 99); 14 Jul 2018 01:02:18 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Jul 2018 01:02:18 +0000 Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com [66.111.4.228]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 34B1DCF3 for ; Sat, 14 Jul 2018 01:02:17 +0000 (UTC) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailauth.nyi.internal (Postfix) with ESMTP id CD6B120BE3 for ; Fri, 13 Jul 2018 21:02:16 -0400 (EDT) Received: from web6 ([10.202.2.216]) by compute2.internal (MEProxy); Fri, 13 Jul 2018 21:02:16 -0400 X-ME-Proxy: X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 83E91411E; Fri, 13 Jul 2018 21:02:16 -0400 (EDT) Message-Id: <1531530136.1423586.1440292944.382A8D94@webmail.messagingengine.com> From: Colin McCabe To: dev@kafka.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-957169fa References: <1531431315.515468.1439025968.1F6DDEDE@webmail.messagingengine.com> Subject: Re: KIP-327: Add describe all topics API to AdminClient In-Reply-To: Date: Fri, 13 Jul 2018 18:02:16 -0700 As Jason wrote, this won't scale as the number of partitions increases. We already have users who have tens of thousands of topics, or more. If you multiply that by 100x over the next few years, you end up with this API returning full information about millions of topics, which clearly doesn't work. We discussed this a lot in the original KIP-117 DISCUSS thread which added the Java AdminClient. ListTopics and DescribeTopics were deliberately kept separate because we understood that eventually a single RPC would not be able to return information about all the topics in the cluster. So I have to vote -1 for this proposal as it stands. I do agree that adding a way to describe topics by a regular expression on the server side would be very useful. This would also fix a major scalability problem we have now, which is that when subscribing via a regular expression, clients need to fetch the full list of all topics in the cluster and filter locally. I think a regular expression library like re2 would be ideal for this purpose. re2 is standardized and language-agnostic (it's not tied only to Java). In contrast, Java regular expression change with different releases of the JDK (there were some changes in java 8, for example). Also, re2 regular expressions are linear time, never exponential time. See https://github.com/google/re2j regards, Colin On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote: > The KIP looks good to me. > However, if there is willingness in the community to work on metadata > request with patterns, the feature proposed here and filtering by '*' or > '.*' would be redundant. > > Andras > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson wrote: > > > Hey Manikumar, > > > > As Kafka begins to scale to larger and larger numbers of topics/partitions, > > I'm a little concerned about the scalability of APIs such as this. The API > > looks benign, but imagine you have have a few million partitions. We > > already expose similar APIs in the producer and consumer, so probably not > > much additional harm to expose it in the AdminClient, but it would be nice > > to put a little thought into some longer term options. We should be giving > > users an efficient way to select a smaller set of the topics they are > > interested in. We have always discussed adding some filtering support to > > the Metadata API. Perhaps now is a good time to reconsider this? We now > > have a convention for wildcard ACLs, so perhaps we can do something > > similar. Full regex support might be ideal given the consumer's > > subscription API, but that is more challenging. What do you think? > > > > Thanks, > > Jason > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha wrote: > > > > > Very useful. LGTM. > > > > > > Thanks, > > > Harsha > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote: > > > > Hi all, > > > > > > > > I have created a KIP to add describe all topics API to AdminClient . > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 327%3A+Add+describe+all+topics+API+to+AdminClient > > > > > > > > Please take a look. > > > > > > > > Thanks, > > > > >