Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 67D39200C26 for ; Sat, 11 Feb 2017 01:18:56 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 663AC160B69; Sat, 11 Feb 2017 00:18:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3C4E0160B5C for ; Sat, 11 Feb 2017 01:18:55 +0100 (CET) Received: (qmail 80117 invoked by uid 500); 11 Feb 2017 00:18:54 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 80104 invoked by uid 99); 11 Feb 2017 00:18:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Feb 2017 00:18:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 605A61A0224 for ; Sat, 11 Feb 2017 00:18:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.779 X-Spam-Level: * X-Spam-Status: No, score=1.779 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluent-io.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id cNKClleus1Gm for ; Sat, 11 Feb 2017 00:18:46 +0000 (UTC) Received: from mail-qt0-f179.google.com (mail-qt0-f179.google.com [209.85.216.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 290AF5FE07 for ; Sat, 11 Feb 2017 00:18:46 +0000 (UTC) Received: by mail-qt0-f179.google.com with SMTP id k15so50226281qtg.3 for ; Fri, 10 Feb 2017 16:18:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluent-io.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=L+ab3uiaSDOPpXeliU0siAGC19utFHosmIY3vyMj2DQ=; b=0MUXBtwugQ+oeeyE5EsHHQklLiwi4ucnLjJ+FwYwljZtVtSQd00ulwA7lwHQGdk+dH LsboZgZN//wAm9ATvjcbE+See2Gj180jU1zLoHLTJnmm3OaXkpzVotqt6VjMNay9q0d6 G0+GmMIi/KDZ7UoFov36bKiVLbHu2e1WTq1m78m2iwboVU65/ewWMYnQfsibDs2gZdGL +eAK+i3eVzJUtuff/wVXqpgzfQADatdaFPJG/NeqGTLzeip7jTRN4uYrZ4gC44Qdf9FQ TWcfwJDPZCCkvutIopjf2jtmYQXN7mhvtUEssqPgrDLPRWzg0z0SpbvdQnAz1x6fvuY6 42LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=L+ab3uiaSDOPpXeliU0siAGC19utFHosmIY3vyMj2DQ=; b=ShLcKNzRtP6r1pinAx1Pcsd326G46tI8+KgJXe7IN+RUlprdK7BIPBLMY1ci/Pzt3p oynGRizRArqnX8WnW1FUm9joeHFMe1HUQE1lpxU8yyYLWE3OdwFtTsLhdwNjTzZUs/rj AJ/di3J3JpY8KMvO3jzQbTENX4LG4qomwOXlshEYJgfS+Ic4YA6FZm1Y4x9g8eWGoinp Ap+Z693DUkS4nlxwWZNrubPKmGw2ncyUtxJmHKMnmePS3cxE6OYiby+ZT+mOCnPgNRRS ocPU+dwTOssoZOqXQk9PAJEfOB7ImBIHUxVlNw3O4kW4SN5jwHzdhJ7Q8mZcrtB5c4P/ rHLA== X-Gm-Message-State: AMke39kX5NkJ1N6zQqmXb85yTfdo75dVbAYITkWBrRj1aZerwblIPIsD2g0G5srCZWGC1ARGM/yXZ7WHoV8s9Dh6 X-Received: by 10.200.37.125 with SMTP id 58mr11347818qtn.232.1486772322730; Fri, 10 Feb 2017 16:18:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.169.216 with HTTP; Fri, 10 Feb 2017 16:18:42 -0800 (PST) In-Reply-To: References: <1486061689.380058.868358712.6DD69B3A@webmail.messagingengine.com> <1486146278.669375.869523496.719E5C29@webmail.messagingengine.com> <1486491592.2728580.873422848.7B58049A@webmail.messagingengine.com> <1486602009.3112357.875032616.155DB21D@webmail.messagingengine.com> <1486666262.159335.875955056.2D228299@webmail.messagingengine.com> From: Jun Rao Date: Fri, 10 Feb 2017 16:18:42 -0800 Message-ID: Subject: Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations To: "dev@kafka.apache.org" Content-Type: multipart/alternative; boundary=001a1141079a28ad64054836288c archived-at: Sat, 11 Feb 2017 00:18:56 -0000 --001a1141079a28ad64054836288c Content-Type: text/plain; charset=UTF-8 Hi, Dong, For KIP-107, the purgeDataBefore() api will eventually be added to the AdminClient too, right? It would be useful to make the apis consistent. Currently, in KIP-107, we do batching in purgeDataBefore(). In Colin's current proposal, there is no batching. Thanks, Jun On Thu, Feb 9, 2017 at 10:54 AM, Dong Lin wrote: > Thanks for the explanation. This makes sense. > > Best, > Dong > > On Thu, Feb 9, 2017 at 10:51 AM, Colin McCabe wrote: > > > On Wed, Feb 8, 2017, at 19:02, Dong Lin wrote: > > > I am not aware of any semantics that will be caused by sharing > > > NetworkClient between producer/consumer and AdminClient. But I agree > that > > > there is currently no good way to share such an internal class between > > > them. And yes, goal is to reduce number of connections. For example, > say > > > we > > > want to enable auto data purge based on committed offset using > > > AdminClient.purgeDataBefore(...) in a stream processing application, > > then > > > in addition to producer or consumer, we will now have AdminClient in > > > every > > > job. It means that the the number of connection between server and > client > > > will double. > > > > > > I have another comment on the KIP. Is AdminClient API supposed to be > > > thread > > > safe? > > > > Yes. The wiki page states: "The client will be multi-threaded; multiple > > threads will be able to safely make calls using the same AdminClient > > object." > > > > > If so, should we mark private variables such as clientTimeoutMs to > > > be @volatile? Would it be a concern if two threads call > > > TopicsContext.setServerTimeout(...) concurrently to use different > > timeout > > > for their own use-case? > > > > The context objects are extremely lightweight and are not intended to be > > shared between multiple threads. So each thread would just do > > client.topics().setClientTimeout(...).create(), and operate on its own > > TopicsContext. I will add that to the wiki. > > > > best, > > Colin > > > > > > > > Thanks, > > > Dong > > > > > > On Wed, Feb 8, 2017 at 6:50 PM, Jason Gustafson > > > wrote: > > > > > > > I'm not too sure sharing NetworkClient is a good idea. The consumer > > and the > > > > producer both have request semantics which would be more difficult to > > > > reason about if the connections are shared with another client. Also, > > the > > > > NetworkClient is an internal class which is not really meant for > > users. Do > > > > we really want to open that up? Is the only benefit saving the number > > of > > > > connections? Seems not worth it in my opinion. > > > > > > > > -Jason > > > > > > > > On Wed, Feb 8, 2017 at 6:43 PM, Dong Lin > wrote: > > > > > > > > > BTW, the idea to share NetworkClient is suggested by Radai and I > like > > > > this > > > > > idea. > > > > > > > > > > On Wed, Feb 8, 2017 at 6:39 PM, Dong Lin > > wrote: > > > > > > > > > > > Hey Colin, > > > > > > > > > > > > Thanks for updating the KIP. I have two followup questions: > > > > > > > > > > > > - It seems that setCreationConfig(...) is a bit redundant given > > that > > > > most > > > > > > arguments (e.g. topic name, partition num) are already passed to > > > > > > TopicsContext.create(...) when user creates topic. Should we pass > > > > > > the creationConfig as a parameter to TopicsContext.create(..)? > > > > > > > > > > > > - I am wondering if we should also specify the constructor of the > > > > > > AdminClient in the KIP. Previously we agreed that AdminClient > > should > > > > have > > > > > > its own thread to poll NetworkClient to send/receive messages. > > Should > > > > we > > > > > > also allow AdminClient to use an existing NetworkClient that is > > > > provided > > > > > to > > > > > > the constructor? This would allow AdminClient to share > > NetworkClient > > > > with > > > > > > producer or consumer in order to reduce the total number of open > > > > sockets > > > > > on > > > > > > both client and server. > > > > > > > > > > > > Thanks, > > > > > > Dong > > > > > > > > > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe > > > > > wrote: > > > > > > > > > > > >> Hi all, > > > > > >> > > > > > >> I made some major revisions to the proposal on the wiki, so > please > > > > check > > > > > >> it out. > > > > > >> > > > > > >> The new API is based on Ismael's suggestion of grouping related > > APIs. > > > > > >> There is only one layer of grouping. I think that it's actually > > > > pretty > > > > > >> intuitive. It's also based on the idea of using Futures, which > > > > several > > > > > >> people commented that they'd like to see. > > > > > >> > > > > > >> Here's a simple example: > > > > > >> > > > > > >> > AdminClient client = new AdminClientImpl(myConfig); > > > > > >> > try { > > > > > >> > client.topics().create("foo", 3, (short) 2, false).get(); > > > > > >> > Collection topicNames = > client.topics().list(false). > > > > get(); > > > > > >> > log.info("Found topics: {}", Utils.mkString(topicNames, ", > > ")); > > > > > >> > Collection nodes = client.nodes().list().get(); > > > > > >> > log.info("Found cluster nodes: {}", Utils.mkString(nodes, > ", > > > > ")); > > > > > >> > } finally { > > > > > >> > client.close(); > > > > > >> > } > > > > > >> > > > > > >> The good thing is, there is no Try, no 'get' prefixes, no > messing > > with > > > > > >> batch APIs. If there is an error, then Future#get() throws an > > > > > >> ExecutionException which wraps the relevant exception in the > > standard > > > > > >> Java way. > > > > > >> > > > > > >> Here's a slightly less simple example: > > > > > >> > > > > > >> > AdminClient client = new AdminClientImpl(myConfig); > > > > > >> > try { > > > > > >> > List> futures = new LinkedList<>(); > > > > > >> > for (String topicName: myNewTopicNames) { > > > > > >> > creations.add(client.topics(). > > > > > >> > setClientTimeout(30000).setCreationConfig( > > myTopicConfig). > > > > > >> > create(topicName, 3, (short) 2, false)); > > > > > >> > } > > > > > >> > Futures.waitForAll(futures); > > > > > >> > } finally { > > > > > >> > client.close(); > > > > > >> > } > > > > > >> > > > > > >> I went with Futures because I feel like ought to have some > option > > for > > > > > >> doing async. It's a style of programming that has become a lot > > more > > > > > >> popular with the rise of Node.js, Twisted python, etc. etc. > > Also, as > > > > > >> Ismael commented, Java 8 CompletableFuture is going to make > Java's > > > > > >> support for fluent async programming a lot stronger by allowing > > call > > > > > >> chaining and much more. > > > > > >> > > > > > >> If we are going to support async, the simplest thing is just to > > make > > > > > >> everything return a future and let people call get() if they > want > > to > > > > run > > > > > >> synchronously. Having a mix of async and sync APIs is just > going > > to > > > > be > > > > > >> confusing and redundant. > > > > > >> > > > > > >> I think we should try to avoid creating single functions that > > start > > > > > >> multiple requests if we can. It makes things much uglier. It > > means > > > > > >> that you have to have some kind of request class that wraps up > the > > > > > >> request the user is trying to create, so that you can handle an > > array > > > > of > > > > > >> those requests. The return value has to be something like > > Map > > > > >> Try> to represent which nodes failed and succeeded. This > > is > > > > the > > > > > >> kind of stuff that, in my opinion, makes people scratch their > > heads. > > > > > >> > > > > > >> If we need to, we can still get some of the efficiency benefits > of > > > > batch > > > > > >> APIs by waiting for a millisecond or two before sending out a > > topic > > > > > >> create() request to see if other create() requests arrive. If > > so, we > > > > > >> can coalesce them. It might be better to figure out if this is > an > > > > > >> actual performance issue before implementing it, though. > > > > > >> > > > > > >> I think it would be good to get something out there, annotate it > > as > > > > > >> @Unstable, and get feedback from people building against trunk > and > > > > using > > > > > >> it. We have removed or changed @Unstable APIs in streams > before, > > so I > > > > > >> don't think we should worry that it will get set in stone > > prematurely. > > > > > >> The AdminClient API should get much less developer use than > > anything > > > > in > > > > > >> streams, so changing an unstable API should be much easier. > > > > > >> > > > > > >> best, > > > > > >> Colin > > > > > >> > > > > > >> > > > > > >> On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote: > > > > > >> > Thanks for elaborating Jay. I totally agree that we have to be > > very > > > > > >> > careful > > > > > >> > in how we use our complexity budget. Easier said than done > when > > > > people > > > > > >> > don't agree on what is complex and what is simple. :) For > > example, I > > > > > >> > think > > > > > >> > batch APIs are a significant source of complexity as you have > > to do > > > > a > > > > > >> > bunch > > > > > >> > of ceremony to group things before sending the request and > error > > > > > >> handling > > > > > >> > becomes more complex due to partial failures (things like > `Try` > > or > > > > > other > > > > > >> > mechanisms that serve a similar role are then needed). > > > > > >> > > > > > > >> > Maybe a way forward is to write API usage examples to help > > validate > > > > > that > > > > > >> > the suggested API is indeed easy to use. > > > > > >> > > > > > > >> > Ismael > > > > > >> > > > > > > >> > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps > > wrote: > > > > > >> > > > > > > >> > > Totally agree on CompletableFuture. Also agree with some of > > the > > > > > rough > > > > > >> edges > > > > > >> > > on the Consumer. > > > > > >> > > > > > > > >> > > I don't have much of a leg to stand on with the splitting vs > > not > > > > > >> splitting > > > > > >> > > thing, really hard to argue one or the other is better. I > > guess > > > > the > > > > > >> one > > > > > >> > > observation in watching us try to make good public apis over > > the > > > > > >> years is I > > > > > >> > > am kind of in favor of a particular kind of simple. In > > particular > > > > I > > > > > >> think > > > > > >> > > since the bar is sooo high in support and docs and the > > community > > > > of > > > > > >> users > > > > > >> > > so broad in the range of their capabilities, it makes it so > > there > > > > is > > > > > >> a lot > > > > > >> > > of value in dead simple interfaces that don't have a lot of > > > > > conceptual > > > > > >> > > weight, don't introduce a lot of new classes or concepts or > > > > general > > > > > >> > > patterns that must be understood to use them correctly. So > > things > > > > > like > > > > > >> > > nesting, or the Try class, or async apis, or even just a > > complex > > > > set > > > > > >> of > > > > > >> > > classes representing arguments or return values kind of all > > stack > > > > > >> > > conceptual burdens on the user to figure out correct usage. > So > > > > like, > > > > > >> for > > > > > >> > > example, the Try class is very elegant and represents a > whole > > > > > >> generalized > > > > > >> > > class of possibly completed actions, but the flip side is > > maybe > > > > I'm > > > > > >> just a > > > > > >> > > working guy who needs to list his kafka topics but doesn't > > know > > > > > Rust, > > > > > >> take > > > > > >> > > pity on me! :-) > > > > > >> > > > > > > > >> > > Nit picking aside, super excited to see us progress on this. > > > > > >> > > > > > > > >> > > -Jay > > > > > >> > > > > > > > >> > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma < > ismael@juma.me.uk > > > > > > > > wrote: > > > > > >> > > > > > > > >> > > > Hi Jay, > > > > > >> > > > > > > > > >> > > > Thanks for the feedback. Comments inline. > > > > > >> > > > > > > > > >> > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps < > jay@confluent.io > > > > > > > > wrote: > > > > > >> > > > > > > > > > >> > > > > - I think it would be good to not use "get" as the > > prefix > > > > for > > > > > >> things > > > > > >> > > > > making remote calls. We've tried to avoid the java > > getter > > > > > >> convention > > > > > >> > > > > entirely (see code style guide), but for remote calls > > in > > > > > >> particular > > > > > >> > > it > > > > > >> > > > > kind > > > > > >> > > > > of blurs the line between field access and remote RPC > > in a > > > > > way > > > > > >> that > > > > > >> > > > > leads > > > > > >> > > > > people to trouble. What about, e.g., fetchAllGroups() > > vs > > > > > >> > > > getAllGroups(). > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > Agreed that we should avoid the `get` prefix for remote > > calls. > > > > > >> There are > > > > > >> > > a > > > > > >> > > > few possible prefixes for the read operations: list, > fetch, > > > > > >> describe. > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > - I think futures and callbacks are a bit of a pain > to > > use. > > > > > I'd > > > > > >> > > second > > > > > >> > > > > Becket's comment: let's ensure there a common use > case > > > > > >> motivating > > > > > >> > > > these > > > > > >> > > > > that wouldn't be just as easily satisfied with batch > > > > > operations > > > > > >> > > (which > > > > > >> > > > > we > > > > > >> > > > > seem to have at least for some things). In terms of > > > > > flexibility > > > > > >> > > > > Callbacks > > > > > > >> > > > > Futures > Batch Ops but I think in terms of usability > > it is > > > > > the > > > > > >> > > exact > > > > > >> > > > > opposite so let's make sure we have worked out how > the > > API > > > > > >> will be > > > > > >> > > > used > > > > > >> > > > > before deciding. In particular I think java Futures > are > > > > often > > > > > >> an > > > > > >> > > > > uncomfortable half-way point since calling get() and > > > > blocking > > > > > >> the > > > > > >> > > > > thread is > > > > > >> > > > > often not what you want for chaining sequences of > > > > operations > > > > > >> in a > > > > > >> > > > truly > > > > > >> > > > > async way, so 99% of people just use the future as a > > way to > > > > > >> batch > > > > > >> > > > calls. > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > We should definitely figure out how the APIs are going to > be > > > > used > > > > > >> before > > > > > >> > > > deciding. I agree that callbacks are definitely painful > and > > > > > there's > > > > > >> > > little > > > > > >> > > > reason to expose them in a modern API unless it's meant to > > be > > > > very > > > > > >> low > > > > > >> > > > level. When it comes to Futures, I think it's important to > > > > > >> distinguish > > > > > >> > > what > > > > > >> > > > is available in Java 7 and below versus what is available > > from > > > > > Java > > > > > >> 8 > > > > > >> > > > onwards. CompletableFuture makes it much easier to > > compose/chain > > > > > >> > > operations > > > > > >> > > > (in a similar vein to java.util.Stream, our own Streams > API, > > > > etc.) > > > > > >> and it > > > > > >> > > > gives you the ability to register callbacks if you really > > want > > > > to > > > > > >> > > (avoiding > > > > > >> > > > the somewhat odd situation in the Producer where we > return a > > > > > Future > > > > > >> _and_ > > > > > >> > > > allow you to pass a callback). > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > - Personally I don't think splitting the admin > methods > > up > > > > > >> actually > > > > > >> > > > makes > > > > > >> > > > > things more usable. It just makes you have to dig > > through > > > > our > > > > > >> > > > > hierarchy. I > > > > > >> > > > > think a flat class with a bunch of operations (like > the > > > > > >> consumer > > > > > >> > > api) > > > > > >> > > > is > > > > > >> > > > > probably the easiest for people to grok and find > things > > > > on. I > > > > > >> am > > > > > >> > > kind > > > > > >> > > > > of a > > > > > >> > > > > dumb PHP programmer at heart, though. > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > I am not sure it's fair to compare the AdminClient with > the > > > > > >> Consumer. The > > > > > >> > > > former has APIs for a bunch of unrelated APIs (topics, > ACLs, > > > > > >> configs, > > > > > >> > > > consumer groups, delegation tokens, preferred leader > > election, > > > > > >> partition > > > > > >> > > > reassignment, etc.) where the latter is pretty > specialised. > > For > > > > > >> each of > > > > > >> > > the > > > > > >> > > > resources, you may have 3-4 operations, it will get > > confusing > > > > > fast. > > > > > >> Also, > > > > > >> > > > do you really think an API that has one level of grouping > > will > > > > > mean > > > > > >> that > > > > > >> > > > users have to "dig through our hierarchy"? Or are you > > concerned > > > > > >> that once > > > > > >> > > > we go in that direction, there is a danger of making the > > > > hierarchy > > > > > >> more > > > > > >> > > > complicated? > > > > > >> > > > > > > > > >> > > > Finally, I am not sure I would use the consumer as an > > example of > > > > > >> > > something > > > > > >> > > > that is easy to grok. :) The fact that methods behave > pretty > > > > > >> differently > > > > > >> > > > (some are blocking while others only have an effect after > > poll) > > > > > >> with no > > > > > >> > > > indication from the type signature or naming convention > > makes it > > > > > >> harder, > > > > > >> > > > not easier, to understand. > > > > > >> > > > > > > > > >> > > > Ismael > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > --001a1141079a28ad64054836288c--