kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <...@confluent.io>
Subject Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations
Date Sat, 11 Feb 2017 00:18:42 GMT
Hi, Dong,

For KIP-107, the purgeDataBefore() api will eventually be added to the
AdminClient too, right? It would be useful to make the apis consistent.
Currently, in KIP-107, we do batching in purgeDataBefore(). In Colin's
current proposal, there is no batching.

Thanks,

Jun

On Thu, Feb 9, 2017 at 10:54 AM, Dong Lin <lindong28@gmail.com> wrote:

> Thanks for the explanation. This makes sense.
>
> Best,
> Dong
>
> On Thu, Feb 9, 2017 at 10:51 AM, Colin McCabe <cmccabe@apache.org> wrote:
>
> > On Wed, Feb 8, 2017, at 19:02, Dong Lin wrote:
> > > I am not aware of any semantics that will be caused by sharing
> > > NetworkClient between producer/consumer and AdminClient. But I agree
> that
> > > there is currently no good way to share such an internal class between
> > > them. And yes, goal is to reduce number of connections. For example,
> say
> > > we
> > > want to enable auto data purge based on committed offset using
> > > AdminClient.purgeDataBefore(...) in a stream processing application,
> > then
> > > in addition to producer or consumer, we will now have AdminClient in
> > > every
> > > job. It means that the the number of connection between server and
> client
> > > will double.
> > >
> > > I have another comment on the KIP. Is AdminClient API supposed to be
> > > thread
> > > safe?
> >
> > Yes.  The wiki page states: "The client will be multi-threaded; multiple
> > threads will be able to safely make calls using the same AdminClient
> > object."
> >
> > > If so, should we mark private variables such as clientTimeoutMs to
> > > be @volatile? Would it be a concern if two threads call
> > > TopicsContext.setServerTimeout(...) concurrently to use different
> > timeout
> > > for their own use-case?
> >
> > The context objects are extremely lightweight and are not intended to be
> > shared between multiple threads.  So each thread would just do
> > client.topics().setClientTimeout(...).create(), and operate on its own
> > TopicsContext.  I will add that to the wiki.
> >
> > best,
> > Colin
> >
> > >
> > > Thanks,
> > > Dong
> > >
> > > On Wed, Feb 8, 2017 at 6:50 PM, Jason Gustafson <jason@confluent.io>
> > > wrote:
> > >
> > > > I'm not too sure sharing NetworkClient is a good idea. The consumer
> > and the
> > > > producer both have request semantics which would be more difficult to
> > > > reason about if the connections are shared with another client. Also,
> > the
> > > > NetworkClient is an internal class which is not really meant for
> > users. Do
> > > > we really want to open that up? Is the only benefit saving the number
> > of
> > > > connections? Seems not worth it in my opinion.
> > > >
> > > > -Jason
> > > >
> > > > On Wed, Feb 8, 2017 at 6:43 PM, Dong Lin <lindong28@gmail.com>
> wrote:
> > > >
> > > > > BTW, the idea to share NetworkClient is suggested by Radai and I
> like
> > > > this
> > > > > idea.
> > > > >
> > > > > On Wed, Feb 8, 2017 at 6:39 PM, Dong Lin <lindong28@gmail.com>
> > wrote:
> > > > >
> > > > > > Hey Colin,
> > > > > >
> > > > > > Thanks for updating the KIP. I have two followup questions:
> > > > > >
> > > > > > - It seems that setCreationConfig(...) is a bit redundant given
> > that
> > > > most
> > > > > > arguments (e.g. topic name, partition num) are already passed
to
> > > > > > TopicsContext.create(...) when user creates topic. Should we
pass
> > > > > > the creationConfig as a parameter to TopicsContext.create(..)?
> > > > > >
> > > > > > - I am wondering if we should also specify the constructor of
the
> > > > > > AdminClient in the KIP. Previously we agreed that AdminClient
> > should
> > > > have
> > > > > > its own thread to poll NetworkClient to send/receive messages.
> > Should
> > > > we
> > > > > > also allow AdminClient to use an existing NetworkClient that
is
> > > > provided
> > > > > to
> > > > > > the constructor? This would allow AdminClient to share
> > NetworkClient
> > > > with
> > > > > > producer or consumer in order to reduce the total number of
open
> > > > sockets
> > > > > on
> > > > > > both client and server.
> > > > > >
> > > > > > Thanks,
> > > > > > Dong
> > > > > >
> > > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <cmccabe@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I made some major revisions to the proposal on the wiki,
so
> please
> > > > check
> > > > > >> it out.
> > > > > >>
> > > > > >> The new API is based on Ismael's suggestion of grouping
related
> > APIs.
> > > > > >> There is only one layer of grouping.  I think that it's
actually
> > > > pretty
> > > > > >> intuitive.  It's also based on the idea of using Futures,
which
> > > > several
> > > > > >> people commented that they'd like to see.
> > > > > >>
> > > > > >> Here's a simple example:
> > > > > >>
> > > > > >>  > AdminClient client = new AdminClientImpl(myConfig);
> > > > > >>  > try {
> > > > > >>  >   client.topics().create("foo", 3, (short) 2, false).get();
> > > > > >>  >   Collection<String> topicNames =
> client.topics().list(false).
> > > > get();
> > > > > >>  >   log.info("Found topics: {}", Utils.mkString(topicNames,
",
> > "));
> > > > > >>  >   Collection<Node> nodes = client.nodes().list().get();
> > > > > >>  >   log.info("Found cluster nodes: {}", Utils.mkString(nodes,
> ",
> > > > "));
> > > > > >>  > } finally {
> > > > > >>  >   client.close();
> > > > > >>  > }
> > > > > >>
> > > > > >> The good thing is, there is no Try, no 'get' prefixes, no
> messing
> > with
> > > > > >> batch APIs.  If there is an error, then Future#get() throws
an
> > > > > >> ExecutionException which wraps the relevant exception in
the
> > standard
> > > > > >> Java way.
> > > > > >>
> > > > > >> Here's a slightly less simple example:
> > > > > >>
> > > > > >> > AdminClient client = new AdminClientImpl(myConfig);
> > > > > >> > try {
> > > > > >> >   List<Future<Void>> futures = new LinkedList<>();
> > > > > >> >   for (String topicName: myNewTopicNames) {
> > > > > >> >     creations.add(client.topics().
> > > > > >> >         setClientTimeout(30000).setCreationConfig(
> > myTopicConfig).
> > > > > >> >           create(topicName, 3, (short) 2, false));
> > > > > >> >   }
> > > > > >> >   Futures.waitForAll(futures);
> > > > > >> > } finally {
> > > > > >> >   client.close();
> > > > > >> > }
> > > > > >>
> > > > > >> I went with Futures because I feel like ought to have some
> option
> > for
> > > > > >> doing async.  It's a style of programming that has become
a lot
> > more
> > > > > >> popular with the rise of Node.js, Twisted python, etc. etc.
> > Also, as
> > > > > >> Ismael commented, Java 8 CompletableFuture is going to make
> Java's
> > > > > >> support for fluent async programming a lot stronger by allowing
> > call
> > > > > >> chaining and much more.
> > > > > >>
> > > > > >> If we are going to support async, the simplest thing is
just to
> > make
> > > > > >> everything return a future and let people call get() if
they
> want
> > to
> > > > run
> > > > > >> synchronously.  Having a mix of async and sync APIs is just
> going
> > to
> > > > be
> > > > > >> confusing and redundant.
> > > > > >>
> > > > > >> I think we should try to avoid creating single functions
that
> > start
> > > > > >> multiple requests if we can.  It makes things much uglier.
 It
> > means
> > > > > >> that you have to have some kind of request class that wraps
up
> the
> > > > > >> request the user is trying to create, so that you can handle
an
> > array
> > > > of
> > > > > >> those requests.  The return value has to be something like
> > Map<Node,
> > > > > >> Try<Value>> to represent which nodes failed and
succeeded.  This
> > is
> > > > the
> > > > > >> kind of stuff that, in my opinion, makes people scratch
their
> > heads.
> > > > > >>
> > > > > >> If we need to, we can still get some of the efficiency benefits
> of
> > > > batch
> > > > > >> APIs by waiting for a millisecond or two before sending
out a
> > topic
> > > > > >> create() request to see if other create() requests arrive.
 If
> > so, we
> > > > > >> can coalesce them.  It might be better to figure out if
this is
> an
> > > > > >> actual performance issue before implementing it, though.
> > > > > >>
> > > > > >> I think it would be good to get something out there, annotate
it
> > as
> > > > > >> @Unstable, and get feedback from people building against
trunk
> and
> > > > using
> > > > > >> it.  We have removed or changed @Unstable APIs in streams
> before,
> > so I
> > > > > >> don't think we should worry that it will get set in stone
> > prematurely.
> > > > > >> The AdminClient API should get much less developer use than
> > anything
> > > > in
> > > > > >> streams, so changing an unstable API should be much easier.
> > > > > >>
> > > > > >> best,
> > > > > >> Colin
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote:
> > > > > >> > Thanks for elaborating Jay. I totally agree that we
have to be
> > very
> > > > > >> > careful
> > > > > >> > in how we use our complexity budget. Easier said than
done
> when
> > > > people
> > > > > >> > don't agree on what is complex and what is simple.
:) For
> > example, I
> > > > > >> > think
> > > > > >> > batch APIs are a significant source of complexity as
you have
> > to do
> > > > a
> > > > > >> > bunch
> > > > > >> > of ceremony to group things before sending the request
and
> error
> > > > > >> handling
> > > > > >> > becomes more complex due to partial failures (things
like
> `Try`
> > or
> > > > > other
> > > > > >> > mechanisms that serve a similar role are then needed).
> > > > > >> >
> > > > > >> > Maybe a way forward is to write API usage examples
to help
> > validate
> > > > > that
> > > > > >> > the suggested API is indeed easy to use.
> > > > > >> >
> > > > > >> > Ismael
> > > > > >> >
> > > > > >> > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <jay@confluent.io>
> > wrote:
> > > > > >> >
> > > > > >> > > Totally agree on CompletableFuture. Also agree
with some of
> > the
> > > > > rough
> > > > > >> edges
> > > > > >> > > on the Consumer.
> > > > > >> > >
> > > > > >> > > I don't have much of a leg to stand on with the
splitting vs
> > not
> > > > > >> splitting
> > > > > >> > > thing, really hard to argue one or the other is
better. I
> > guess
> > > > the
> > > > > >> one
> > > > > >> > > observation in watching us try to make good public
apis over
> > the
> > > > > >> years is I
> > > > > >> > > am kind of in favor of a particular kind of simple.
In
> > particular
> > > > I
> > > > > >> think
> > > > > >> > > since the bar is sooo high in support and docs
and the
> > community
> > > > of
> > > > > >> users
> > > > > >> > > so broad in the range of their capabilities, it
makes it so
> > there
> > > > is
> > > > > >> a lot
> > > > > >> > > of value in dead simple interfaces that don't
have a lot of
> > > > > conceptual
> > > > > >> > > weight, don't introduce a lot of new classes or
concepts or
> > > > general
> > > > > >> > > patterns that must be understood to use them correctly.
So
> > things
> > > > > like
> > > > > >> > > nesting, or the Try class, or async apis, or even
just a
> > complex
> > > > set
> > > > > >> of
> > > > > >> > > classes representing arguments or return values
kind of all
> > stack
> > > > > >> > > conceptual burdens on the user to figure out correct
usage.
> So
> > > > like,
> > > > > >> for
> > > > > >> > > example, the Try class is very elegant and represents
a
> whole
> > > > > >> generalized
> > > > > >> > > class of possibly completed actions, but the flip
side is
> > maybe
> > > > I'm
> > > > > >> just a
> > > > > >> > > working guy who needs to list his kafka topics
but doesn't
> > know
> > > > > Rust,
> > > > > >> take
> > > > > >> > > pity on me! :-)
> > > > > >> > >
> > > > > >> > > Nit picking aside, super excited to see us progress
on this.
> > > > > >> > >
> > > > > >> > > -Jay
> > > > > >> > >
> > > > > >> > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <
> ismael@juma.me.uk
> > >
> > > > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Jay,
> > > > > >> > > >
> > > > > >> > > > Thanks for the feedback. Comments inline.
> > > > > >> > > >
> > > > > >> > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps
<
> jay@confluent.io
> > >
> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > >    - I think it would be good to not
use "get" as the
> > prefix
> > > > for
> > > > > >> things
> > > > > >> > > > >    making remote calls. We've tried
to avoid the java
> > getter
> > > > > >> convention
> > > > > >> > > > >    entirely (see code style guide),
but for remote calls
> > in
> > > > > >> particular
> > > > > >> > > it
> > > > > >> > > > > kind
> > > > > >> > > > >    of blurs the line between field access
and remote RPC
> > in a
> > > > > way
> > > > > >> that
> > > > > >> > > > > leads
> > > > > >> > > > >    people to trouble. What about, e.g.,
fetchAllGroups()
> > vs
> > > > > >> > > > getAllGroups().
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > > Agreed that we should avoid the `get` prefix
for remote
> > calls.
> > > > > >> There are
> > > > > >> > > a
> > > > > >> > > > few possible prefixes for the read operations:
list,
> fetch,
> > > > > >> describe.
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > >    - I think futures and callbacks are
a bit of a pain
> to
> > use.
> > > > > I'd
> > > > > >> > > second
> > > > > >> > > > >    Becket's comment: let's ensure there
a common use
> case
> > > > > >> motivating
> > > > > >> > > > these
> > > > > >> > > > >    that wouldn't be just as easily satisfied
with batch
> > > > > operations
> > > > > >> > > (which
> > > > > >> > > > > we
> > > > > >> > > > >    seem to have at least for some things).
In terms of
> > > > > flexibility
> > > > > >> > > > > Callbacks >
> > > > > >> > > > >    Futures > Batch Ops but I think
in terms of usability
> > it is
> > > > > the
> > > > > >> > > exact
> > > > > >> > > > >    opposite so let's make sure we have
worked out how
> the
> > API
> > > > > >> will be
> > > > > >> > > > used
> > > > > >> > > > >    before deciding. In particular I
think java Futures
> are
> > > > often
> > > > > >> an
> > > > > >> > > > >    uncomfortable half-way point since
calling get() and
> > > > blocking
> > > > > >> the
> > > > > >> > > > > thread is
> > > > > >> > > > >    often not what you want for chaining
sequences of
> > > > operations
> > > > > >> in a
> > > > > >> > > > truly
> > > > > >> > > > >    async way, so 99% of people just
use the future as a
> > way to
> > > > > >> batch
> > > > > >> > > > calls.
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > > We should definitely figure out how the APIs
are going to
> be
> > > > used
> > > > > >> before
> > > > > >> > > > deciding. I agree that callbacks are definitely
painful
> and
> > > > > there's
> > > > > >> > > little
> > > > > >> > > > reason to expose them in a modern API unless
it's meant to
> > be
> > > > very
> > > > > >> low
> > > > > >> > > > level. When it comes to Futures, I think
it's important to
> > > > > >> distinguish
> > > > > >> > > what
> > > > > >> > > > is available in Java 7 and below versus what
is available
> > from
> > > > > Java
> > > > > >> 8
> > > > > >> > > > onwards. CompletableFuture makes it much
easier to
> > compose/chain
> > > > > >> > > operations
> > > > > >> > > > (in a similar vein to java.util.Stream, our
own Streams
> API,
> > > > etc.)
> > > > > >> and it
> > > > > >> > > > gives you the ability to register callbacks
if you really
> > want
> > > > to
> > > > > >> > > (avoiding
> > > > > >> > > > the somewhat odd situation in the Producer
where we
> return a
> > > > > Future
> > > > > >> _and_
> > > > > >> > > > allow you to pass a callback).
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > >    - Personally I don't think splitting
the admin
> methods
> > up
> > > > > >> actually
> > > > > >> > > > makes
> > > > > >> > > > >    things more usable. It just makes
you have to dig
> > through
> > > > our
> > > > > >> > > > > hierarchy. I
> > > > > >> > > > >    think a flat class with a bunch of
operations (like
> the
> > > > > >> consumer
> > > > > >> > > api)
> > > > > >> > > > is
> > > > > >> > > > >    probably the easiest for people to
grok and find
> things
> > > > on. I
> > > > > >> am
> > > > > >> > > kind
> > > > > >> > > > > of a
> > > > > >> > > > >    dumb PHP programmer at heart, though.
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > > I am not sure it's fair to compare the AdminClient
with
> the
> > > > > >> Consumer. The
> > > > > >> > > > former has APIs for a bunch of unrelated
APIs (topics,
> ACLs,
> > > > > >> configs,
> > > > > >> > > > consumer groups, delegation tokens, preferred
leader
> > election,
> > > > > >> partition
> > > > > >> > > > reassignment, etc.) where the latter is pretty
> specialised.
> > For
> > > > > >> each of
> > > > > >> > > the
> > > > > >> > > > resources, you may have 3-4 operations, it
will get
> > confusing
> > > > > fast.
> > > > > >> Also,
> > > > > >> > > > do you really think an API that has one level
of grouping
> > will
> > > > > mean
> > > > > >> that
> > > > > >> > > > users have to "dig through our hierarchy"?
Or are you
> > concerned
> > > > > >> that once
> > > > > >> > > > we go in that direction, there is a danger
of making the
> > > > hierarchy
> > > > > >> more
> > > > > >> > > > complicated?
> > > > > >> > > >
> > > > > >> > > > Finally, I am not sure I would use the consumer
as an
> > example of
> > > > > >> > > something
> > > > > >> > > > that is easy to grok. :) The fact that methods
behave
> pretty
> > > > > >> differently
> > > > > >> > > > (some are blocking while others only have
an effect after
> > poll)
> > > > > >> with no
> > > > > >> > > > indication from the type signature or naming
convention
> > makes it
> > > > > >> harder,
> > > > > >> > > > not easier, to understand.
> > > > > >> > > >
> > > > > >> > > > Ismael
> > > > > >> > > >
> > > > > >> > >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message