kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurva Mehta <apu...@confluent.io>
Subject Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting
Date Thu, 17 Aug 2017 03:58:01 GMT
Thanks for the followup Becket. It sounds we are on agreement on the scope
of this KIP, and the discussion has definitely clarified a lot of the
subtle points.

Apurva

On Tue, Aug 15, 2017 at 10:49 PM, Becket Qin <becket.qin@gmail.com> wrote:

> Hi Apurva,
>
> Thanks for the clarification of the definition. The definitions are clear
> and helpful.
>
> It seems the scope of this KIP is just about the producer side
> configuration change, but not attempting to achieve the exactly once
> semantic with all default settings out of the box. The broker still needs
> to be configured appropriately to achieve the exactly once semantic. If so,
> the current proposal sounds reasonable to me. Apologies if I misunderstood
> the goal of this KIP.
>
> Regarding the max.in.flight.requests.per.connection, I don't think we have
> to support infinite number of in flight requests. But admittedly there are
> use cases that people would want to have reasonably high in flight
> requests. Given that we need to make code changes to support idempotence
> and in.flight.request > 1, it would be nice to see if we can cover those
> use cases instead of doing that later. We can discuss this in a separate
> thread.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Tue, Aug 15, 2017 at 1:46 PM, Guozhang Wang <wangguoz@gmail.com> wrote:
>
> > Hi Jay,
> >
> > I chatted with Apurva offline, and we think the key of the discussion is
> > that, as summarized in the updated KIP wiki, whether we should consider
> > replication as a necessary condition of at-least-once, and of course also
> > exactly-once. Originally I think replication is not a necessary condition
> > for at-least-once, since the scope of failures that we should be covering
> > is different in my definition; if we claim that "even for at-least-once,
> > you should have replication factor larger than 2, let alone exactly-once"
> > then I agree that having acks=all on the client side should also be a
> > necessary condition for at-least-once, and for exactly-once as well. Then
> > this KIP would be just providing what is necessary but not sufficient
> > conditions, from client-side configs to achieve EOS, while you also need
> > the broker-side configs together to really support it.
> >
> > Guozhang
> >
> >
> > On Tue, Aug 15, 2017 at 1:15 PM, Jay Kreps <jay@confluent.io> wrote:
> >
> > > Hey Guozhang,
> > >
> > > I think the argument is that with acks=1 the message could be lost and
> > > hence you aren't guaranteeing exactly once delivery.
> > >
> > > -Jay
> > >
> > > On Mon, Aug 14, 2017 at 1:36 PM, Guozhang Wang <wangguoz@gmail.com>
> > wrote:
> > >
> > > > Just want to clarify that regarding 1), I'm fine with changing it to
> > > `all`
> > > > but just wanted to argue it is not necessarily correlate with the
> > > > exactly-once semantics, but rather on persistence v.s. availability
> > > > trade-offs, so I'd like to discuss them separately.
> > > >
> > > > Regarding 2), one minor concern I had is that the enforcement is on
> the
> > > > client side while the parts it affects is on the broker side. I.e.
> the
> > > > broker code would assume at most 5 in.flight when idempotent is
> turned
> > > on,
> > > > but this is not enforced at the broker but relying at the client
> side's
> > > > sanity. So other implementations of the client that may not obey this
> > may
> > > > likely break the broker code. If we do enforce this we'd better
> enforce
> > > it
> > > > at the broker side. Also, I'm wondering if we have considered the
> > > approach
> > > > for brokers to read the logs in order to get the starting offset when
> > it
> > > > does not about it in its snapshot, that whether it is worthwhile if
> we
> > > > assume that such issues are very rare to happen?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > >
> > > > On Mon, Aug 14, 2017 at 11:01 AM, Apurva Mehta <apurva@confluent.io>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I just want to summarize where we are in this discussion
> > > > >
> > > > > There are two major points of contention: should we have acks=1 or
> > > > acsk=all
> > > > > by default? and how to cap max.in.flight.requests.per.connection?
> > > > >
> > > > > 1) acks=1 vs acks=all1
> > > > >
> > > > > Here are the tradeoffs of each:
> > > > >
> > > > > If you have replication-factor=N, your data is resilient N-1 to
> disk
> > > > > failures. For N>1, here is the tradeoff between acks=1 and
> acks=all.
> > > > >
> > > > > With proposed defaults and acks=all, the stock Kafka producer and
> the
> > > > > default broker settings would guarantee that ack'd messages would
> be
> > in
> > > > the
> > > > > log exactly once.
> > > > >
> > > > > With the proposed defaults and acks=1, the stock Kafka producer and
> > the
> > > > > default broker settings would guarantee that 'retained ack'd
> messages
> > > > would
> > > > > be in the log exactly once. But all ack'd messages may not be
> > > retained'.
> > > > >
> > > > > If you leave replication-factor=1, acks=1 and acks=all have
> identical
> > > > > semantics and performance, but you are resilient to 0 disk
> failures.
> > > > >
> > > > > I think the measured cost (again the performance details are in the
> > > wiki)
> > > > > of acks=all is well worth the much clearer semantics. What does the
> > > rest
> > > > of
> > > > > the community think?
> > > > >
> > > > > 2) capping max.in.flight at 5 when idempotence is enabled.
> > > > >
> > > > > We need to limit the max.in.flight for the broker to de-duplicate
> > > > messages
> > > > > properly. The limitation would only apply when idempotence is
> > enabled.
> > > > The
> > > > > shared numbers show that when the client-broker latency is low,
> there
> > > is
> > > > no
> > > > > performance gain for max.inflight > 2.
> > > > >
> > > > > Further, it is highly debatable that max.in.flight=500 is
> > significantly
> > > > > better than max.in.flight=5  for a really high latency
> client-broker
> > > > link,
> > > > > and so far there are no hard numbers one way or another. However,
> > > > assuming
> > > > > that max.in.flight=500 is significantly better than max.inflight=5
> in
> > > > some
> > > > > niche use case, the user would have to sacrifice idempotence for
> > > > > throughput. In this extreme corner case, I think it is an
> acceptable
> > > > > tradeoff.
> > > > >
> > > > > What does the community think?
> > > > >
> > > > > Thanks,
> > > > > Apurva
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message