kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams
Date Thu, 09 May 2019 01:37:23 GMT
Hello all,

Thanks for everyone who've shared their feedbacks for this KIP! If there's
no further comments I'll start the voting thread by end of tomorrow.


Guozhang.

On Wed, May 8, 2019 at 6:36 PM Guozhang Wang <wangguoz@gmail.com> wrote:

> Hello Boyang,
>
> On Wed, May 1, 2019 at 4:51 PM Boyang Chen <bchen11@outlook.com> wrote:
>
>> Hey Guozhang,
>>
>> thank you for the great write up. Overall the motivation and changes
>> LGTM, just some minor comments:
>>
>>
>>   1.  In "Consumer Coordinator Algorithm", we could reorder alphabet
>> points for 3d~3f from ["ready-to-migrate-partitions",
>> "unknown-but-owned-partitions",  "maybe-revoking-partitions"] to
>> ["maybe-revoking-partitions", "ready-to-migrate-partitions",
>> "unknown-but-owned-partitions"] in order to be consistent with 3c1~3.
>>
>
> Ack. Updated.
>
>
>>   2.  In "Consumer Coordinator Algorithm", 1c suggests to revoke all
>> partition upon heartbeat/commit fail. What's the gain here? Do we want to
>> keep all partitions running at this moment, to be optimistic for the case
>> when no partitions get reassigned?
>>
>
> That's a good catch. When REBALANCE_IN_PROGRESS is received, we can just
> re-join the group with all the currently owned partitions encoded. Updated.
>
>
>>   3.  In "Recommended Upgrade Procedure", remove extra 'those': " The
>> 'sticky' assignor works even those there are "
>>
>
> Ack, should be `even when`.
>
>
>>   4.  Put two "looking into the future" into a separate category from
>> migration session. It seems inconsistent for readers to see this before we
>> finished discussion for everything.
>>
>
> Ack.
>
>
>>   5.  Have we discussed the concern on the serialization? Could the new
>> metadata we are adding grow larger than the message size cap?
>>
>
> We're completing https://issues.apache.org/jira/browse/KAFKA-7149 which
> should largely reduce the message size (will update the wiki accordingly as
> well).
>
>
>>
>> Boyang
>>
>> ________________________________
>> From: Guozhang Wang <wangguoz@gmail.com>
>> Sent: Monday, April 15, 2019 9:20 AM
>> To: dev
>> Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams
>>
>> Hello Jason,
>>
>> I agree with you that for range / round-robin it makes less sense to be
>> compatible with cooperative rebalance protocol.
>>
>> As for StickyAssignor, however, I think it would still be possible to make
>> the current implementation to be compatible with cooperative rebalance. So
>> after pondering on different options at hand I'm now proposing this
>> approach as listed in the upgrade section:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP-429:KafkaConsumerIncrementalRebalanceProtocol-CompatibilityandUpgradePath
>>
>> The idea is to let assignors specify which protocols it would work with,
>> associating with a different name; then the upgrade path would involve a
>> "compatible" protocol which actually still use eager behavior while
>> encoding two assignors if possible. In "Rejected Section" (just to clarify
>> I'm not finalizing it as rejected, just putting it there for now, and if
>> we
>> like this one instead we can always switch them) I listed the other
>> approach we once discussed about, and arguing its cons of duplicated class
>> seems overwhelm the pros of saving the  "rebalance.protocol" config.
>>
>> Let me know WDYT.
>>
>> Guozhang
>>
>> On Fri, Apr 12, 2019 at 6:08 PM Jason Gustafson <jason@confluent.io>
>> wrote:
>>
>> > Hi Guozhang,
>> >
>> > Responses below:
>> >
>> > 2. The interface's default implementation will just be
>> > > `onPartitionRevoked`, so for user's instantiation if they do not make
>> any
>> > > code changes they should be able to recompile the code and continue.
>> >
>> >
>> > Ack, makes sense.
>> >
>> > 4. Hmm.. not sure if it will work. The main issue is that the
>> > > consumer-coordinator behavior (whether to revoke all or none at
>> > > onRebalancePrepare) is independent of the selected protocol's assignor
>> > > (eager or cooperative), so even if the assignor is selected to be the
>> > > old-versioned one, we will still not revoke at the
>> consumer-coordinator
>> > > layer and hence has the same risk of migrating still-owned partitions,
>> > > right?
>> >
>> >
>> > Yeah, basically we would have to push the eager/cooperative logic into
>> the
>> > PartitionAssignor itself and make the consumer aware of the rebalance
>> > protocol it is compatible with. As long as an eager protocol _could_ be
>> > selected, the consumer would have to be pessimistic and do eager
>> > revocation. But if all the assignors configured in the consumer support
>> > cooperative reassignment, then either 1) a cooperative protocol will be
>> > selected and cooperative revocation can be safely used, or 2) if the
>> rest
>> > of the group does not support it, then the consumer will simply fail.
>> >
>> > Another point which you raised offline and I will repeat here is that
>> this
>> > proposal's benefit is mostly limited to sticky assignment logic.
>> Arguably
>> > the range assignor may have some incidental stickiness, particularly if
>> the
>> > group is rebalancing for a newly created or deleted topic. For other
>> cases,
>> > the proposal is mostly additional overhead since it takes an additional
>> > rebalance and many of the partitions will move. Perhaps it doesn't make
>> as
>> > much sense to use the cooperative protocol for strategies like range and
>> > round-robin. That kind of argues in favor of pushing some of the control
>> > into the assignor itself. Maybe we would not bother creating
>> > CooperativeRange as I suggested above, but it would make sense to
>> create a
>> > cooperative version of the sticky assignment strategy. I thought we
>> might
>> > have to create a new sticky assignor anyway because I can't see how we
>> > would get compatible behavior mixing with the old version anyway.
>> >
>> > Thanks,
>> > Jason
>> >
>> >
>> > On Thu, Apr 11, 2019 at 5:53 PM Guozhang Wang <wangguoz@gmail.com>
>> wrote:
>> >
>> > > Hello Matthias:
>> > >
>> > > Thanks for your review.
>> > >
>> > > The background section uses streams assignor as well as the consumer's
>> > own
>> > > stick assignor as examples illustrating the situation, but this KIP is
>> > for
>> > > consumer coordinator itself, and the rest of the paragraph did not
>> talk
>> > > about Streams any more. If you feel it's a bit distracted I can remove
>> > > those examples.
>> > >
>> > > 10). While working on the PR I realized that the revoked partitions on
>> > > assignment is not needed (this is being discussed on the PR itself:
>> > > https://github.com/apache/kafka/pull/6528#issuecomment-480009890
>> > >
>> > > 20). 1.a. Good question, I've updated the wiki to let the consumer's
>> > > cleanup assignment and re-join, and not letting assignor making any
>> > > proactive changes. The idea is to keep logic simpler and not doing any
>> > > "split brain" stuff.
>> > >
>> > > 20). 2.b. No we do not need, since the owned-partitions will be part
>> of
>> > the
>> > > Subscription passed in to assign() already.
>> > >
>> > > 30). As Boyang mentioned, there are some drawbacks that can not be
>> > > addressed by rebalance delay still, hence still voted KIP-345 (some
>> more
>> > > details can be found on the discussion thread of KIP-345 itself). One
>> > > example is that as the instance resumes, its member id will be empty
>> so
>> > we
>> > > are still relying on assignor to give it the assignment from the old
>> > > member-id while keeping all other member's assignment unchanged.
>> > >
>> > > 40). Incomplete sentence, I've updated it.
>> > >
>> > > 50). Here's my idea: suppose we augment the join group schema with
>> > > `protocol version` in 2.3, and then with both brokers and clients
>> being
>> > in
>> > > version 2.3+, on the first rolling bounce where subscription and
>> > assignment
>> > > schema and / or user metadata has changed, this protocol version will
>> be
>> > > bumped. On the broker side, when receiving all member's join-group
>> > request,
>> > > it will choose the one that has the highest protocol version (also it
>> > > assumes higher versioned protocol is always backward compatible, i.e.
>> the
>> > > coordinator can recognize lower versioned protocol as well) and
>> select it
>> > > as the leader. Then the leader can decide, based on its received and
>> > > deserialized subscription information, how to assign partitions and
>> how
>> > to
>> > > encode the assignment accordingly so that everyone can understand it.
>> > With
>> > > this, in Streams for example, no version probing would be needed
>> since we
>> > > are guaranteed the leader knows everyone's version -- again it is
>> > assuming
>> > > that higher versioned protocol is always backward compatible -- and
>> hence
>> > > can successfully do the assignment at that round.
>> > >
>> > > 60). My bad, this section was not updated while the design was
>> evolved,
>> > > I've updated it.
>> > >
>> > >
>> > > On Tue, Apr 9, 2019 at 7:22 PM Boyang Chen <bchen11@outlook.com>
>> wrote:
>> > >
>> > > >
>> > > > Thanks for the review Matthias! My 2-cent on the rebalance delay is
>> > that
>> > > > it is a rather fixed trade-off between
>> > > >
>> > > > task availability and resource shuffling. If we eventually trigger
>> > > > rebalance after rolling bounce, certain consumer
>> > > >
>> > > > setup is still faced with global shuffles, for example member.id
>> > ranking
>> > > > based round robin strategy, as rejoining dynamic
>> > > >
>> > > > members will be assigned with new member.id which reorders the
>> > > > assignment. So I think the primary goal of incremental
>> > > >
>> > > > rebalancing is still improving the cluster availability during
>> > rebalance,
>> > > > because it didn't revoke any partition during this
>> > > >
>> > > > process. Also, the perk is minimum configuration requirement :)
>> > > >
>> > > >
>> > > > Best,
>> > > >
>> > > > Boyang
>> > > >
>> > > > ________________________________
>> > > > From: Matthias J. Sax <matthias@confluent.io>
>> > > > Sent: Tuesday, April 9, 2019 7:47 AM
>> > > > To: dev
>> > > > Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka
>> Streams
>> > > >
>> > > > Thank for the KIP, Boyang and Guozhang!
>> > > >
>> > > >
>> > > > I made an initial pass and have some questions/comments. One high
>> level
>> > > > comment: it seems that the KIP "mixes" plain consumer and Kafka
>> Streams
>> > > > use case a little bit (at least in the presentation). It might be
>> > > > helpful to separate both cases clearly, or maybe limit the scope to
>> > > > plain consumer only.
>> > > >
>> > > >
>> > > >
>> > > > 10) For `PartitionAssignor.Assignment`: It seems we need a new
>> method
>> > > > `List<TopicPartitions> revokedPartitions()` ?
>> > > >
>> > > >
>> > > >
>> > > > 20) In Section "Consumer Coordinator Algorithm"
>> > > >
>> > > >     Bullet point "1a)": If the subscription changes and a topic is
>> > > > removed from the subscription, why do we not revoke the partitions?
>> > > >
>> > > >     Bullet point "1a)": What happens is a topic is deleted (or a
>> > > > partition is removed/deleted from a topic)? Should we call the new
>> > > > `onPartitionsEmigrated()` callback for this case?
>> > > >
>> > > >     Bullet point "2b)" Should we update the `PartitionAssignor`
>> > > > interface to pass in the "old assignment" as third parameter into
>> > > > `assign()`?
>> > > >
>> > > >
>> > > >
>> > > > 30) Rebalance delay (as used in KIP-415): Could a rebalance delay
>> > > > subsume KIP-345? Configuring static members is rather complicated,
>> and
>> > I
>> > > > am wondering if a rebalance delay would be sufficient?
>> > > >
>> > > >
>> > > >
>> > > > 40) Quote: "otherwise the we would fall into the case 3.b) forever."
>> > > >
>> > > > What is "case 3.b" ?
>> > > >
>> > > >
>> > > >
>> > > > 50) Section "Looking into the Future"
>> > > >
>> > > > Nit: the new "ProtocolVersion" field is missing in the first line
>> > > > describing "JoinGroupRequest"
>> > > >
>> > > > > This can also help saving "version probing" cost on Streams as
>> well.
>> > > >
>> > > > How does this relate to Kafka Streams "version probing"
>> implementation?
>> > > > How can we exploit the new `ProtocolVersion` in Streams to improve
>> > > > "version probing" ? I have a rough idea, but would like to hear more
>> > > > details.
>> > > >
>> > > >
>> > > >
>> > > > 60) Section "Recommended Upgrade Procedure"
>> > > >
>> > > > > Set the `stream.rebalancing.mode` to `upgrading`, which will
force
>> > the
>> > > > stream application to stay with protocol type "consumer".
>> > > >
>> > > > This config is not discussed in the KIP and appears in this section
>> > > > without context. Can you elaborate about it?
>> > > >
>> > > >
>> > > >
>> > > > -Matthias
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On 3/29/19 6:20 PM, Guozhang Wang wrote:
>> > > > > Bump up on this discussion thread. I've added a few new drawings
>> for
>> > > > better
>> > > > > illustration, would really appreciate your feedbacks.
>> > > > >
>> > > > >
>> > > > > Guozhang
>> > > > >
>> > > > > On Wed, Mar 20, 2019 at 6:17 PM Guozhang Wang <wangguoz@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > >> Hello Boyang,
>> > > > >>
>> > > > >> I've made another thorough pass over this KIP and I'd like
to
>> spilt
>> > it
>> > > > >> into two parts: the first part, covered in KIP-429 would
be
>> touching
>> > > on
>> > > > >> Consumer Coordinator only to have incremental rebalance protocol
>> in
>> > > > place.
>> > > > >> The second part (for now I've reserved KIP number 444 for
it)
>> would
>> > > > contain
>> > > > >> all the changes on StreamsPartitionAssginor to allow warming
up
>> new
>> > > > >> members.
>> > > > >>
>> > > > >> I think the first part, a.k.a. the current updated KIP-429
is
>> ready
>> > > for
>> > > > >> review and discussions again. Would love to hear people's
>> feedbacks
>> > > and
>> > > > >> ideas.
>> > > > >>
>> > > > >> Guozhang
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Mon, Mar 4, 2019 at 10:09 AM Boyang Chen <bchen11@outlook.com
>> >
>> > > > wrote:
>> > > > >>
>> > > > >>> Thanks Guozhang for the great questions. Answers are
inlined:
>> > > > >>>
>> > > > >>> 1. I'm still not sure if it's worthwhile to add a new
type of
>> > > "learner
>> > > > >>> task" in addition to "standby task": if the only difference
is
>> that
>> > > for
>> > > > >>> the
>> > > > >>> latter, we would consider workload balance while for
the former
>> we
>> > > > would
>> > > > >>> not, I think we can just adjust the logic of StickyTaskAssignor
>> a
>> > bit
>> > > > to
>> > > > >>> break that difference. Adding a new type of task would
be
>> adding a
>> > > lot
>> > > > of
>> > > > >>> code complexity, so if we can still piggy-back the logic
on a
>> > > > standby-task
>> > > > >>> I would prefer to do so.
>> > > > >>> In the proposal we stated that we are not adding a new
type of
>> task
>> > > > >>> implementation. The
>> > > > >>> learner task shall share the same implementation with
normal
>> > standby
>> > > > >>> task, only that we
>> > > > >>> shall tag the standby task with learner and prioritize
the
>> learner
>> > > > tasks
>> > > > >>> replay effort.
>> > > > >>> 2. One thing that's still not clear from the KIP wiki
itself is
>> > which
>> > > > >>> layer
>> > > > >>> would the logic be implemented at. Although for most
KIPs we
>> would
>> > > not
>> > > > >>> require internal implementation details but only public
facing
>> API
>> > > > >>> updates,
>> > > > >>> for a KIP like this I think it still requires to flesh
out
>> details
>> > on
>> > > > the
>> > > > >>> implementation design. More specifically: today Streams
embed a
>> > full
>> > > > >>> fledged Consumer client, which hard-code a ConsumerCoordinator
>> > > inside,
>> > > > >>> Streams then injects a StreamsPartitionAssignor to its
pluggable
>> > > > >>> PartitionAssignor interface and inside the
>> StreamsPartitionAssignor
>> > > we
>> > > > >>> also
>> > > > >>> have a TaskAssignor interface whose default implementation
is
>> > > > >>> StickyPartitionAssignor. Streams partition assignor logic
today
>> > sites
>> > > > in
>> > > > >>> the latter two classes. Hence the hierarchy today is:
>> > > > >>>
>> > > > >>> KafkaConsumer -> ConsumerCoordinator ->
>> StreamsPartitionAssignor ->
>> > > > >>> StickyTaskAssignor.
>> > > > >>>
>> > > > >>> We need to think about where the proposed implementation
would
>> take
>> > > > place
>> > > > >>> at, and personally I think it is not the best option
to inject
>> all
>> > of
>> > > > them
>> > > > >>> into the StreamsPartitionAssignor / StickyTaskAssignor
since the
>> > > logic
>> > > > of
>> > > > >>> "triggering another rebalance" etc would require some
>> coordinator
>> > > logic
>> > > > >>> which is hard to mimic at PartitionAssignor level. On
the other
>> > hand,
>> > > > >>> since
>> > > > >>> we are embedding a KafkaConsumer client as a whole we
cannot
>> just
>> > > > replace
>> > > > >>> ConsumerCoordinator with a specialized StreamsCoordinator
like
>> > > Connect
>> > > > >>> does
>> > > > >>> in KIP-415. So I'd like to maybe split the current proposal
in
>> both
>> > > > >>> consumer layer and streams-assignor layer like we did
in
>> > > > KIP-98/KIP-129.
>> > > > >>> And then the key thing to consider is how to cut off
the
>> boundary
>> > so
>> > > > that
>> > > > >>> the modifications we push to ConsumerCoordinator would
be
>> > beneficial
>> > > > >>> universally for any consumers, while keep the Streams-specific
>> > logic
>> > > at
>> > > > >>> the
>> > > > >>> assignor level.
>> > > > >>> Yes, that's also my ideal plan. The details for the
>> implementation
>> > > are
>> > > > >>> depicted
>> > > > >>> in this doc<
>> > > > >>>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae
>> > > > >,
>> > > > [
>> > > >
>> > >
>> >
>> https://lh5.googleusercontent.com/DXWMyKNE9rFFIv7TNX56Q41QwqYp8ynivwWSJHHORqSRkoQxtraW2bqiB-NRUGAMYKkt8A=w1200-h630-p
>> > > > ]<
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae
>> > > > >
>> > > >
>> > > > [External] KStream Smooth Auto-scaling Implementation Plan<
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae
>> > > > >
>> > > > docs.google.com
>> > > > KStream Incremental Rebalancing Implementation Plan Authors: Boyang
>> > Chen,
>> > > > Guozhang Wang KIP link Stage: [Draft | Review | Approved]
>> Background We
>> > > > initiated KIP-429 for the promotion of incremental rebalancing work
>> for
>> > > > KStream. Behind the scene, there is non-trivial amount of effort
>> that
>> > > needs
>> > > > to...
>> > > >
>> > > >
>> > > >
>> > > > >>> and I have explained the reasoning on why we want to
push a
>> > > > >>> global change of replacing ConsumerCoordinator with
>> > > StreamCoordinator.
>> > > > >>> The motivation
>> > > > >>> is that KIP space is usually used for public & algorithm
level
>> > > change,
>> > > > >>> not for internal
>> > > > >>> implementation details.
>> > > > >>>
>> > > > >>> 3. Depending on which design direction we choose, our
migration
>> > plan
>> > > > would
>> > > > >>> also be quite different. For example, if we stay with
>> > > > ConsumerCoordinator
>> > > > >>> whose protocol type is "consumer" still, and we can manage
to
>> make
>> > > all
>> > > > >>> changes agnostic to brokers as well as to old versioned
>> consumers,
>> > > then
>> > > > >>> our
>> > > > >>> migration plan could be much easier.
>> > > > >>> Yes, the upgrade plan was designed to take the new
>> > StreamCoordinator
>> > > > >>> approach
>> > > > >>> which means we shall define a new protocol type. For
existing
>> > > > application
>> > > > >>> we could only
>> > > > >>> maintain the same `consumer` protocol type is because
current
>> > broker
>> > > > only
>> > > > >>> allows
>> > > > >>> change of protocol type when the consumer group is empty.
It is
>> of
>> > > > course
>> > > > >>> user-unfriendly to force
>> > > > >>> a wipe-out for the entire application, and I don't think
>> > maintaining
>> > > > old
>> > > > >>> protocol type would greatly
>> > > > >>> impact ongoing services using new stream coordinator.
WDYT?
>> > > > >>>
>> > > > >>> 4. I think one major issue related to this KIP is that
today, in
>> > the
>> > > > >>> StickyPartitionAssignor, we always try to honor stickiness
over
>> > > > workload
>> > > > >>> balance, and hence "learner task" is needed to break
this
>> priority,
>> > > but
>> > > > >>> I'm
>> > > > >>> wondering if we can have a better solution within sticky
task
>> > > assignor
>> > > > >>> that
>> > > > >>> accommodate this?
>> > > > >>> Great question! That's what I explained in the proposal,
which
>> is
>> > > that
>> > > > we
>> > > > >>> should breakdown our
>> > > > >>> delivery into different stages. At very beginning, our
goal is
>> to
>> > > > trigger
>> > > > >>> learner task assignment only on
>> > > > >>> `new` hosts, where we shall leverage leader's knowledge
of
>> previous
>> > > > round
>> > > > >>> of rebalance to figure out. After
>> > > > >>> stage one, our goal is to have a smooth scaling up experience,
>> but
>> > > the
>> > > > >>> task balance problem is kind of orthogonal.
>> > > > >>> The load balance problem is a much broader topic than
auto
>> scaling,
>> > > > which
>> > > > >>> I figure worth discussing within
>> > > > >>> this KIP's context since it's a naturally next-step,
but
>> wouldn't
>> > be
>> > > > the
>> > > > >>> main topic.
>> > > > >>> Learner task or auto scaling support should be treated
as `a
>> > helpful
>> > > > >>> mechanism to reach load balance`, but not `an algorithm
defining
>> > load
>> > > > >>> balance`. It would be great if you could share some insights
of
>> the
>> > > > stream
>> > > > >>> task balance, which eventually helps us to break out
of the
>> > KIP-429's
>> > > > scope
>> > > > >>> and even define a separate KIP to focus on task weight
&
>> assignment
>> > > > logic
>> > > > >>> improvement.
>> > > > >>>
>> > > > >>> Also thank you for making improvement on the KIP context
and
>> > > > organization!
>> > > > >>>
>> > > > >>> Best,
>> > > > >>> Boyang
>> > > > >>> ________________________________
>> > > > >>> From: Guozhang Wang <wangguoz@gmail.com>
>> > > > >>> Sent: Saturday, March 2, 2019 6:00 AM
>> > > > >>> To: dev
>> > > > >>> Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling
for Kafka
>> > > Streams
>> > > > >>>
>> > > > >>> Hello Boyang,
>> > > > >>>
>> > > > >>> I've just made a quick pass on the KIP and here are some
>> thoughts.
>> > > > >>>
>> > > > >>> Meta:
>> > > > >>>
>> > > > >>> 1. I'm still not sure if it's worthwhile to add a new
type of
>> > > "learner
>> > > > >>> task" in addition to "standby task": if the only difference
is
>> that
>> > > for
>> > > > >>> the
>> > > > >>> latter, we would consider workload balance while for
the former
>> we
>> > > > would
>> > > > >>> not, I think we can just adjust the logic of StickyTaskAssignor
>> a
>> > bit
>> > > > to
>> > > > >>> break that difference. Adding a new type of task would
be
>> adding a
>> > > lot
>> > > > of
>> > > > >>> code complexity, so if we can still piggy-back the logic
on a
>> > > > standby-task
>> > > > >>> I would prefer to do so.
>> > > > >>>
>> > > > >>> 2. One thing that's still not clear from the KIP wiki
itself is
>> > which
>> > > > >>> layer
>> > > > >>> would the logic be implemented at. Although for most
KIPs we
>> would
>> > > not
>> > > > >>> require internal implementation details but only public
facing
>> API
>> > > > >>> updates,
>> > > > >>> for a KIP like this I think it still requires to flesh
out
>> details
>> > on
>> > > > the
>> > > > >>> implementation design. More specifically: today Streams
embed a
>> > full
>> > > > >>> fledged Consumer client, which hard-code a ConsumerCoordinator
>> > > inside,
>> > > > >>> Streams then injects a StreamsPartitionAssignor to its
plugable
>> > > > >>> PartitionAssignor interface and inside the
>> StreamsPartitionAssignor
>> > > we
>> > > > >>> also
>> > > > >>> have a TaskAssignor interface whose default implementation
is
>> > > > >>> StickyPartitionAssignor. Streams partition assignor logic
today
>> > sites
>> > > > in
>> > > > >>> the latter two classes. Hence the hierarchy today is:
>> > > > >>>
>> > > > >>> KafkaConsumer -> ConsumerCoordinator ->
>> StreamsPartitionAssignor ->
>> > > > >>> StickyTaskAssignor.
>> > > > >>>
>> > > > >>> We need to think about where the proposed implementation
would
>> take
>> > > > place
>> > > > >>> at, and personally I think it is not the best option
to inject
>> all
>> > of
>> > > > them
>> > > > >>> into the StreamsPartitionAssignor / StickyTaskAssignor
since the
>> > > logic
>> > > > of
>> > > > >>> "triggering another rebalance" etc would require some
>> coordinator
>> > > logic
>> > > > >>> which is hard to mimic at PartitionAssignor level. On
the other
>> > hand,
>> > > > >>> since
>> > > > >>> we are embedding a KafkaConsumer client as a whole we
cannot
>> just
>> > > > replace
>> > > > >>> ConsumerCoordinator with a specialized StreamsCoordinator
like
>> > > Connect
>> > > > >>> does
>> > > > >>> in KIP-415. So I'd like to maybe split the current proposal
in
>> both
>> > > > >>> consumer layer and streams-assignor layer like we did
in
>> > > > KIP-98/KIP-129.
>> > > > >>> And then the key thing to consider is how to cut off
the
>> boundary
>> > so
>> > > > that
>> > > > >>> the modifications we push to ConsumerCoordinator would
be
>> > beneficial
>> > > > >>> universally for any consumers, while keep the Streams-specific
>> > logic
>> > > at
>> > > > >>> the
>> > > > >>> assignor level.
>> > > > >>>
>> > > > >>> 3. Depending on which design direction we choose, our
migration
>> > plan
>> > > > would
>> > > > >>> also be quite different. For example, if we stay with
>> > > > ConsumerCoordinator
>> > > > >>> whose protocol type is "consumer" still, and we can manage
to
>> make
>> > > all
>> > > > >>> changes agnostic to brokers as well as to old versioned
>> consumers,
>> > > then
>> > > > >>> our
>> > > > >>> migration plan could be much easier.
>> > > > >>>
>> > > > >>> 4. I think one major issue related to this KIP is that
today, in
>> > the
>> > > > >>> StickyPartitionAssignor, we always try to honor stickiness
over
>> > > > workload
>> > > > >>> balance, and hence "learner task" is needed to break
this
>> priority,
>> > > but
>> > > > >>> I'm
>> > > > >>> wondering if we can have a better solution within sticky
task
>> > > assignor
>> > > > >>> that
>> > > > >>> accommodate this?
>> > > > >>>
>> > > > >>> Minor:
>> > > > >>>
>> > > > >>> 1. The idea of two rebalances have also been discussed
in
>> > > > >>> https://issues.apache.org/jira/browse/KAFKA-6145. So
we should
>> add
>> > > the
>> > > > >>> reference on the wiki page as well.
>> > > > >>> 2. Could you also add a section describing how the subscription
>> /
>> > > > >>> assignment metadata will be re-formatted? Without this
>> information
>> > it
>> > > > is
>> > > > >>> hard to get to the bottom of your idea. For example in
the
>> "Leader
>> > > > >>> Transfer
>> > > > >>> Before Scaling" section, I'm not sure why "S2 doesn't
know S4 is
>> > new
>> > > > >>> member"
>> > > > >>> and hence would blindly obey stickiness over workload
balance
>> > > > requirement.
>> > > > >>>
>> > > > >>> Guozhang
>> > > > >>>
>> > > > >>>
>> > > > >>> On Thu, Feb 28, 2019 at 11:05 AM Boyang Chen <
>> bchen11@outlook.com>
>> > > > wrote:
>> > > > >>>
>> > > > >>>> Hey community friends,
>> > > > >>>>
>> > > > >>>> I'm gladly inviting you to have a look at the proposal
to add
>> > > > >>> incremental
>> > > > >>>> rebalancing to Kafka Streams, A.K.A auto-scaling
support.
>> > > > >>>>
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Smooth+Auto-Scaling+for+Kafka+Streams
>> > > > >>>>
>> > > > >>>> Special thanks to Guozhang for giving great guidances
and
>> > important
>> > > > >>>> feedbacks while making this KIP!
>> > > > >>>>
>> > > > >>>> Best,
>> > > > >>>> Boyang
>> > > > >>>>
>> > > > >>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> -- Guozhang
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> -- Guozhang
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> > > --
>> > > -- Guozhang
>> > >
>> >
>>
>>
>> --
>> -- Guozhang
>>
>
>
> --
> -- Guozhang
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message