kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Williams <patrick.willi...@storageos.com>
Subject Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth
Date Tue, 04 Dec 2018 14:43:47 GMT
Please take me off this Discuss list

Best,
 
Patrick Williams
 
Sales Manager, UK & Ireland, Nordics & Israel
StorageOS
+44 (0)7549 676279
patrick.williams@storageos.com
 
20 Midtown
20 Proctor Street
Holborn
London WC1V 6NX
 
Twitter: @patch37
LinkedIn: linkedin.com/in/patrickwilliams4 <http://linkedin.com/in/patrickwilliams4>
 
https://slack.storageos.com/
 
 

On 03/12/2018, 12:35, "Stanislav Kozlovski" <stanislav@confluent.io> wrote:

    Hi Jason,
    
    > 2. Do you think we should make this a dynamic config?
    I'm not sure. Looking at the config from the perspective of a prescriptive
    config, we may get away with not updating it dynamically.
    But in my opinion, it always makes sense to have a config be dynamically
    configurable. As long as we limit it to being a cluster-wide config, we
    should be fine.
    
    > 1. I think it would be helpful to clarify the details on how the
    coordinator will shrink the group. It will need to choose which members to
    remove. Are we going to give current members an opportunity to commit
    offsets before kicking them from the group?
    
    This turns out to be somewhat tricky. I think that we may not be able to
    guarantee that consumers don't process a message twice.
    My initial approach was to do as much as we could to let consumers commit
    offsets.
    
    I was thinking that we mark a group to be shrunk, we could keep a map of
    consumer_id->boolean indicating whether they have committed offsets. I then
    thought we could delay the rebalance until every consumer commits (or some
    time passes).
    In the meantime, we would block all incoming fetch calls (by either
    returning empty records or a retriable error) and we would continue to
    accept offset commits (even twice for a single consumer)
    
    I see two problems with this approach:
    * We have async offset commits, which implies that we can receive fetch
    requests before the offset commit req has been handled. i.e consmer sends
    fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
    broker. Meaning we could have saved the offsets for B but rebalance before
    the offsetCommit for the offsets processed in C come in.
    * KIP-392 Allow consumers to fetch from closest replica
    <https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica>
    would
    make it significantly harder to block poll() calls on consumers whose
    groups are being shrunk. Even if we implemented a solution, the same race
    condition noted above seems to apply and probably others
    
    
    Given those constraints, I think that we can simply mark the group as
    `PreparingRebalance` with a rebalanceTimeout of the server setting `
    group.max.session.timeout.ms`. That's a bit long by default (5 minutes) but
    I can't seem to come up with a better alternative
    
    I'm interested in hearing your thoughts.
    
    Thanks,
    Stanislav
    
    On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <jason@confluent.io> wrote:
    
    > Hey Stanislav,
    >
    > What do you think about the use case I mentioned in my previous reply about
    > > a more resilient self-service Kafka? I believe the benefit there is
    > bigger.
    >
    >
    > I see this config as analogous to the open file limit. Probably this limit
    > was intended to be prescriptive at some point about what was deemed a
    > reasonable number of open files for an application. But mostly people treat
    > it as an annoyance which they have to work around. If it happens to be hit,
    > usually you just increase it because it is not tied to an actual resource
    > constraint. However, occasionally hitting the limit does indicate an
    > application bug such as a leak, so I wouldn't say it is useless. Similarly,
    > the issue in KAFKA-7610 was a consumer leak and having this limit would
    > have allowed the problem to be detected before it impacted the cluster. To
    > me, that's the main benefit. It's possible that it could be used
    > prescriptively to prevent poor usage of groups, but like the open file
    > limit, I suspect administrators will just set it large enough that users
    > are unlikely to complain.
    >
    > Anyway, just a couple additional questions:
    >
    > 1. I think it would be helpful to clarify the details on how the
    > coordinator will shrink the group. It will need to choose which members to
    > remove. Are we going to give current members an opportunity to commit
    > offsets before kicking them from the group?
    >
    > 2. Do you think we should make this a dynamic config?
    >
    > Thanks,
    > Jason
    >
    >
    >
    >
    > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
    > stanislav@confluent.io>
    > wrote:
    >
    > > Hi Jason,
    > >
    > > You raise some very valid points.
    > >
    > > > The benefit of this KIP is probably limited to preventing "runaway"
    > > consumer groups due to leaks or some other application bug
    > > What do you think about the use case I mentioned in my previous reply
    > about
    > > a more resilient self-service Kafka? I believe the benefit there is
    > bigger
    > >
    > > * Default value
    > > You're right, we probably do need to be conservative. Big consumer groups
    > > are considered an anti-pattern and my goal was to also hint at this
    > through
    > > the config's default. Regardless, it is better to not have the potential
    > to
    > > break applications with an upgrade.
    > > Choosing between the default of something big like 5000 or an opt-in
    > > option, I think we should go with the *disabled default option*  (-1).
    > > The only benefit we would get from a big default of 5000 is default
    > > protection against buggy/malicious applications that hit the KAFKA-7610
    > > issue.
    > > While this KIP was spawned from that issue, I believe its value is
    > enabling
    > > the possibility of protection and helping move towards a more
    > self-service
    > > Kafka. I also think that a default value of 5000 might be misleading to
    > > users and lead them to think that big consumer groups (> 250) are a good
    > > thing.
    > >
    > > The good news is that KAFKA-7610 should be fully resolved and the
    > rebalance
    > > protocol should, in general, be more solid after the planned improvements
    > > in KIP-345 and KIP-394.
    > >
    > > * Handling bigger groups during upgrade
    > > I now see that we store the state of consumer groups in the log and why a
    > > rebalance isn't expected during a rolling upgrade.
    > > Since we're going with the default value of the max.size being disabled,
    > I
    > > believe we can afford to be more strict here.
    > > During state reloading of a new Coordinator with a defined max.group.size
    > > config, I believe we should *force* rebalances for groups that exceed the
    > > configured size. Then, only some consumers will be able to join and the
    > max
    > > size invariant will be satisfied.
    > >
    > > I updated the KIP with a migration plan, rejected alternatives and the
    > new
    > > default value.
    > >
    > > Thanks,
    > > Stanislav
    > >
    > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <jason@confluent.io>
    > > wrote:
    > >
    > > > Hey Stanislav,
    > > >
    > > > Clients will then find that coordinator
    > > > > and send `joinGroup` on it, effectively rebuilding the group, since
    > the
    > > > > cache of active consumers is not stored outside the Coordinator's
    > > memory.
    > > > > (please do say if that is incorrect)
    > > >
    > > >
    > > > Groups do not typically rebalance after a coordinator change. You could
    > > > potentially force a rebalance if the group is too big and kick out the
    > > > slowest members or something. A more graceful solution is probably to
    > > just
    > > > accept the current size and prevent it from getting bigger. We could
    > log
    > > a
    > > > warning potentially.
    > > >
    > > > My thinking is that we should abstract away from conserving resources
    > and
    > > > > focus on giving control to the broker. The issue that spawned this
    > KIP
    > > > was
    > > > > a memory problem but I feel this change is useful in a more general
    > > way.
    > > >
    > > >
    > > > So you probably already know why I'm asking about this. For consumer
    > > groups
    > > > anyway, resource usage would typically be proportional to the number of
    > > > partitions that a group is reading from and not the number of members.
    > > For
    > > > example, consider the memory use in the offsets cache. The benefit of
    > > this
    > > > KIP is probably limited to preventing "runaway" consumer groups due to
    > > > leaks or some other application bug. That still seems useful though.
    > > >
    > > > I completely agree with this and I *ask everybody to chime in with
    > > opinions
    > > > > on a sensible default value*.
    > > >
    > > >
    > > > I think we would have to be very conservative. The group protocol is
    > > > generic in some sense, so there may be use cases we don't know of where
    > > > larger groups are reasonable. Probably we should make this an opt-in
    > > > feature so that we do not risk breaking anyone's application after an
    > > > upgrade. Either that, or use a very high default like 5,000.
    > > >
    > > > Thanks,
    > > > Jason
    > > >
    > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
    > > > stanislav@confluent.io>
    > > > wrote:
    > > >
    > > > > Hey Jason and Boyang, those were important comments
    > > > >
    > > > > > One suggestion I have is that it would be helpful to put your
    > > reasoning
    > > > > on deciding the current default value. For example, in certain use
    > > cases
    > > > at
    > > > > Pinterest we are very likely to have more consumers than 250 when
we
    > > > > configure 8 stream instances with 32 threads.
    > > > > > For the effectiveness of this KIP, we should encourage people
to
    > > > discuss
    > > > > their opinions on the default setting and ideally reach a consensus.
    > > > >
    > > > > I completely agree with this and I *ask everybody to chime in with
    > > > opinions
    > > > > on a sensible default value*.
    > > > > My thought process was that in the current model rebalances in large
    > > > groups
    > > > > are more costly. I imagine most use cases in most Kafka users do not
    > > > > require more than 250 consumers.
    > > > > Boyang, you say that you are "likely to have... when we..." - do you
    > > have
    > > > > systems running with so many consumers in a group or are you planning
    > > > to? I
    > > > > guess what I'm asking is whether this has been tested in production
    > > with
    > > > > the current rebalance model (ignoring KIP-345)
    > > > >
    > > > > >  Can you clarify the compatibility impact here? What
    > > > > > will happen to groups that are already larger than the max size?
    > > > > This is a very important question.
    > > > > From my current understanding, when a coordinator broker gets shut
    > > > > down during a cluster rolling upgrade, a replica will take leadership
    > > of
    > > > > the `__offset_commits` partition. Clients will then find that
    > > coordinator
    > > > > and send `joinGroup` on it, effectively rebuilding the group, since
    > the
    > > > > cache of active consumers is not stored outside the Coordinator's
    > > memory.
    > > > > (please do say if that is incorrect)
    > > > > Then, I believe that working as if this is a new group is a
    > reasonable
    > > > > approach. Namely, fail joinGroups when the max.size is exceeded.
    > > > > What do you guys think about this? (I'll update the KIP after we
    > settle
    > > > on
    > > > > a solution)
    > > > >
    > > > > >  Also, just to be clear, the resource we are trying to conserve
    > here
    > > is
    > > > > what? Memory?
    > > > > My thinking is that we should abstract away from conserving resources
    > > and
    > > > > focus on giving control to the broker. The issue that spawned this
    > KIP
    > > > was
    > > > > a memory problem but I feel this change is useful in a more general
    > > way.
    > > > It
    > > > > limits the control clients have on the cluster and helps Kafka
    > become a
    > > > > more self-serving system. Admin/Ops teams can better control the
    > impact
    > > > > application developers can have on a Kafka cluster with this change
    > > > >
    > > > > Best,
    > > > > Stanislav
    > > > >
    > > > >
    > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <jason@confluent.io>
    > > > > wrote:
    > > > >
    > > > > > Hi Stanislav,
    > > > > >
    > > > > > Thanks for the KIP. Can you clarify the compatibility impact
here?
    > > What
    > > > > > will happen to groups that are already larger than the max size?
    > > Also,
    > > > > just
    > > > > > to be clear, the resource we are trying to conserve here is what?
    > > > Memory?
    > > > > >
    > > > > > -Jason
    > > > > >
    > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bchen11@outlook.com>
    > > > wrote:
    > > > > >
    > > > > > > Thanks Stanislav for the update! One suggestion I have is
that it
    > > > would
    > > > > > be
    > > > > > > helpful to put your
    > > > > > >
    > > > > > > reasoning on deciding the current default value. For example,
in
    > > > > certain
    > > > > > > use cases at Pinterest we are very likely
    > > > > > >
    > > > > > > to have more consumers than 250 when we configure 8 stream
    > > instances
    > > > > with
    > > > > > > 32 threads.
    > > > > > >
    > > > > > >
    > > > > > > For the effectiveness of this KIP, we should encourage people
to
    > > > > discuss
    > > > > > > their opinions on the default setting and ideally reach
a
    > > consensus.
    > > > > > >
    > > > > > >
    > > > > > > Best,
    > > > > > >
    > > > > > > Boyang
    > > > > > >
    > > > > > > ________________________________
    > > > > > > From: Stanislav Kozlovski <stanislav@confluent.io>
    > > > > > > Sent: Monday, November 26, 2018 6:14 PM
    > > > > > > To: dev@kafka.apache.org
    > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
cap
    > > member
    > > > > > > metadata growth
    > > > > > >
    > > > > > > Hey everybody,
    > > > > > >
    > > > > > > It's been a week since this KIP and not much discussion
has been
    > > > made.
    > > > > > > I assume that this is a straight forward change and I will
open a
    > > > > voting
    > > > > > > thread in the next couple of days if nobody has anything
to
    > > suggest.
    > > > > > >
    > > > > > > Best,
    > > > > > > Stanislav
    > > > > > >
    > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
    > > > > > > stanislav@confluent.io>
    > > > > > > wrote:
    > > > > > >
    > > > > > > > Greetings everybody,
    > > > > > > >
    > > > > > > > I have enriched the KIP a bit with a bigger Motivation
section
    > > and
    > > > > also
    > > > > > > > renamed it.
    > > > > > > > KIP:
    > > > > > > >
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
    > > > > > > >
    > > > > > > > I'm looking forward to discussions around it.
    > > > > > > >
    > > > > > > > Best,
    > > > > > > > Stanislav
    > > > > > > >
    > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski
<
    > > > > > > > stanislav@confluent.io> wrote:
    > > > > > > >
    > > > > > > >> Hey there everybody,
    > > > > > > >>
    > > > > > > >> Thanks for the introduction Boyang. I appreciate
the effort
    > you
    > > > are
    > > > > > > >> putting into improving consumer behavior in Kafka.
    > > > > > > >>
    > > > > > > >> @Matt
    > > > > > > >> I also believe the default value is high. In my
opinion, we
    > > should
    > > > > aim
    > > > > > > to
    > > > > > > >> a default cap around 250. This is because in the
current model
    > > any
    > > > > > > consumer
    > > > > > > >> rebalance is disrupting to every consumer. The
bigger the
    > group,
    > > > the
    > > > > > > longer
    > > > > > > >> this period of disruption.
    > > > > > > >>
    > > > > > > >> If you have such a large consumer group, chances
are that your
    > > > > > > >> client-side logic could be structured better and
that you are
    > > not
    > > > > > using
    > > > > > > the
    > > > > > > >> high number of consumers to achieve high throughput.
    > > > > > > >> 250 can still be considered of a high upper bound,
I believe
    > in
    > > > > > practice
    > > > > > > >> users should aim to not go over 100 consumers per
consumer
    > > group.
    > > > > > > >>
    > > > > > > >> In regards to the cap being global/per-broker,
I think that we
    > > > > should
    > > > > > > >> consider whether we want it to be global or *per-topic*.
For
    > the
    > > > > time
    > > > > > > >> being, I believe that having it per-topic with
a global
    > default
    > > > > might
    > > > > > be
    > > > > > > >> the best situation. Having it global only seems
a bit
    > > restricting
    > > > to
    > > > > > me
    > > > > > > and
    > > > > > > >> it never hurts to support more fine-grained configurability
    > > (given
    > > > > > it's
    > > > > > > the
    > > > > > > >> same config, not a new one being introduced).
    > > > > > > >>
    > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
    > > bchen11@outlook.com
    > > > >
    > > > > > > wrote:
    > > > > > > >>
    > > > > > > >>> Thanks Matt for the suggestion! I'm still open
to any
    > > suggestion
    > > > to
    > > > > > > >>> change the default value. Meanwhile I just
want to point out
    > > that
    > > > > > this
    > > > > > > >>> value is a just last line of defense, not a
real scenario we
    > > > would
    > > > > > > expect.
    > > > > > > >>>
    > > > > > > >>>
    > > > > > > >>> In the meanwhile, I discussed with Stanislav
and he would be
    > > > > driving
    > > > > > > the
    > > > > > > >>> 389 effort from now on. Stanislav proposed
the idea in the
    > > first
    > > > > > place
    > > > > > > and
    > > > > > > >>> had already come up a draft design, while I
will keep
    > focusing
    > > on
    > > > > > > KIP-345
    > > > > > > >>> effort to ensure solving the edge case described
in the JIRA<
    > > > > > > >>>
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
    > > > > > > >.
    > > > > > > >>>
    > > > > > > >>>
    > > > > > > >>> Thank you Stanislav for making this happen!
    > > > > > > >>>
    > > > > > > >>>
    > > > > > > >>> Boyang
    > > > > > > >>>
    > > > > > > >>> ________________________________
    > > > > > > >>> From: Matt Farmer <matt@frmr.me>
    > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
    > > > > > > >>> To: dev@kafka.apache.org
    > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size
to cap
    > > > > member
    > > > > > > >>> metadata growth
    > > > > > > >>>
    > > > > > > >>> Thanks for the KIP.
    > > > > > > >>>
    > > > > > > >>> Will this cap be a global cap across the entire
cluster or
    > per
    > > > > > broker?
    > > > > > > >>>
    > > > > > > >>> Either way the default value seems a bit high
to me, but that
    > > > could
    > > > > > > just
    > > > > > > >>> be
    > > > > > > >>> from my own usage patterns. I’d have probably
started with
    > 500
    > > or
    > > > > 1k
    > > > > > > but
    > > > > > > >>> could be easily convinced that’s wrong.
    > > > > > > >>>
    > > > > > > >>> Thanks,
    > > > > > > >>> Matt
    > > > > > > >>>
    > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen
<
    > > bchen11@outlook.com
    > > > >
    > > > > > > wrote:
    > > > > > > >>>
    > > > > > > >>> > Hey folks,
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> > I would like to start a discussion on
KIP-389:
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>>
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> > This is a pretty simple change to cap
the consumer group
    > size
    > > > for
    > > > > > > >>> broker
    > > > > > > >>> > stability. Give me your valuable feedback
when you got
    > time.
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> > Thank you!
    > > > > > > >>> >
    > > > > > > >>>
    > > > > > > >>
    > > > > > > >>
    > > > > > > >> --
    > > > > > > >> Best,
    > > > > > > >> Stanislav
    > > > > > > >>
    > > > > > > >
    > > > > > > >
    > > > > > > > --
    > > > > > > > Best,
    > > > > > > > Stanislav
    > > > > > > >
    > > > > > >
    > > > > > >
    > > > > > > --
    > > > > > > Best,
    > > > > > > Stanislav
    > > > > > >
    > > > > >
    > > > >
    > > > >
    > > > > --
    > > > > Best,
    > > > > Stanislav
    > > > >
    > > >
    > >
    > >
    > > --
    > > Best,
    > > Stanislav
    > >
    >
    
    
    -- 
    Best,
    Stanislav
    

Mime
View raw message