kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Gustafson <ja...@confluent.io>
Subject Re: [DISCUSS] KIP-125: ZookeeperConsumerConnector to KafkaConsumer Migration and Rollback
Date Mon, 13 Mar 2017 21:32:01 GMT
Hey Onur,


> Regarding 1: I've been considering something like this for a while now.
> KIP-122 has a similar issue and I brought up some hacks in that discussion
> to work around it (http://markmail.org/message/kk4ng74riejidify). While
> solving this problem would help loosen the requirements for migration, it
> seems beyond the scope of this KIP. It's hard to say whether we should be
> trying to solve that issue here.


I won't press if you don't want to do it here, but the point for this KIP
would be to avoid the awkward requirement to first disable offset commits
in Kafka, which feels like a step backwards. I can imagine it causing some
confusion (and annoyance for any users tracking progress through offset
commits in Kafka), but it's probably fine as long as the documentation is
clear.

Regarding 2: I agree that we should offer a tool somewhere to help with the
> migration and do the toggle. It's not clear to me if we should put it in
> kafka-consumer-groups.sh or in some new migration script.


Either way works for me. Eventually we'll deprecate and remove the
capability, so having a separate tool may make that easier. Probably makes
sense for this tool to be part of the KIP.

As an example, we can rid of the notion of "coordination.migration.enabled"
> and just have a config called "coordination.migration.mode" whose values
> can be {"off", "manual", "auto"} where:
>

The "auto" option seems useful. I'm tempted to suggest that be the default
setting, but I guess that would be dangerous since the old group may still
be committing offsets to Kafka. Still it seems useful not to always require
the manual step, especially once you've validated the workflow.

Thanks,
Jason


On Fri, Mar 10, 2017 at 12:42 PM, Onur Karaman <onurkaraman.apache@gmail.com
> wrote:

> I forgot to mention that in that above idea, the
> "coordination.migration.mode" config would default to "auto".
>
> On Fri, Mar 10, 2017 at 1:08 AM, Onur Karaman <
> onurkaraman.apache@gmail.com>
> wrote:
>
> > Hey Jason.
> >
> > Thanks for the comments!
> >
> > Regarding 1: I've been considering something like this for a while now.
> > KIP-122 has a similar issue and I brought up some hacks in that
> discussion
> > to work around it (http://markmail.org/message/kk4ng74riejidify). While
> > solving this problem would help loosen the requirements for migration, it
> > seems beyond the scope of this KIP. It's hard to say whether we should be
> > trying to solve that issue here.
> >
> > Regarding 2: I agree that we should offer a tool somewhere to help with
> > the migration and do the toggle. It's not clear to me if we should put it
> > in kafka-consumer-groups.sh or in some new migration script.
> >
> > Regarding general migration complexity: something else Joel and I had
> > considered was the ability to optionally create the toggle on startup to
> > skip the step of having to manually set the toggle. There are many ways
> we
> > can do this.
> >
> > As an example, we can rid of the notion of "coordination.migration.
> enabled"
> > and just have a config called "coordination.migration.mode" whose values
> > can be {"off", "manual", "auto"} where:
> >
> >    - "off" would act like "coordination.migration.enabled" set to false.
> >    We do not participate in coordination migration.
> >    - "manual" would act like "coordination.migration.enabled" set to
> true
> >    in the current KIP proposal. Do not attempt to create the toggle on
> >    startup, but spin up an EKC and be ready to react to the toggle. This
> mode
> >    helps an org gradually migrate to or rollback from kafka-based
> coordination.
> >    - "auto" would act like "coordination.migration.enabled" set to true
> >    in the current KIP proposal but additionally attempt to create the
> toggle
> >    with "kafka" on startup if the znode doesn't already exist. The same
> rules
> >    from the KIP apply where if a OZKCC or MDZKCC exists, the value is
> ignored
> >    and we just use zookeeper-based coordination. This mode lets us skip
> the
> >    step of having to manually set the toggle.
> >
> > Let me know what you think!
> >
> > On Thu, Mar 9, 2017 at 10:30 AM, Jason Gustafson <jason@confluent.io>
> > wrote:
> >
> >> Hey Onur,
> >>
> >> Sorry for the late reply. Thanks for the well-written KIP! I think the
> >> proposal makes sense. The only thing I was wondering is whether the
> >> process
> >> is a bit complex for most users. You'd probably have no trouble at LI
> >> (especially given you're implementing it!), but I'm not so sure about
> the
> >> users who aren't as close to the Kafka internals. That said, I don't see
> >> any great options to simplify the process, and having this approach is
> >> better than having none, so maybe it's fine. Here are a couple minor
> >> suggestions:
> >>
> >> 1. One thought that came to mind is whether it would be worthwhile to
> add
> >> a
> >> broker config to disable the group membership check for offset commits.
> >> This would simplify the process by eliminating the initial step of
> turning
> >> off offset commits in Kafka for the group to be migrated prior to
> turning
> >> on group coordination through Kafka. I'm not thrilled about this option
> >> since it removes the protection that that check provides (I guess this
> is
> >> no worse than using Kafka for offsets storage with the old consumer
> >> anyway). Also it's a config we'd to ultimately have to deprecate and
> >> remove.
> >>
> >> 2. It seems like the toggle on the group's coordination mode is done
> >> manually. Should we add that to consumer-groups.sh?
> >>
> >> Thanks,
> >> Jason
> >>
> >> On Thu, Feb 23, 2017 at 1:22 PM, Dong Lin <lindong28@gmail.com> wrote:
> >>
> >> > Yeah, I agree it is a bit complex to do that approach for a one-time
> >> > migration. Probably not worth it. Here is another idea to reduce, but
> >> not
> >> > eliminate, the amount of message duplication during migration. I am
> fine
> >> > with not doing it. Just want to see the opinion from open source
> >> community.
> >> >
> >> > The problem with current solution is that, when we toggle the
> zookeeper
> >> > path in order to migrate from MEZKCC, with 50% probability the old
> >> owner of
> >> > the partition may reduce notification later than the new partition
> >> owner.
> >> > Thus the new partition owner may reduce the offset committed by the
> >> older
> >> > owner 5 sec ago assuming the auto-commit interval is 10 sec. The
> >> messages
> >> > produced in this 5 sec window may be consumed multiple times. This
> >> amount
> >> > is even more if consumer is bootstrapping.
> >> >
> >> > One way to mitigate this problem is for the MEZKCC to sleep for a
> >> > configurable amount of time after it receives zookeeper notification
> but
> >> > before it starts to fetch offset and consume message. This seems like
> an
> >> > easy change that allows user to tradeoff between the message
> duplication
> >> > and consumer downtime.
> >> >
> >> >
> >> >
> >> > On Thu, Feb 23, 2017 at 11:20 AM, Joel Koshy <jjkoshy.w@gmail.com>
> >> wrote:
> >> >
> >> > > Regarding (2) - yes that's a good point. @Onur - I think the KIP
> >> should
> >> > > explicitly call this out.
> >> > > It is something that we did consider and decided against optimizing
> >> for.
> >> > > i.e., we just wrote that off as a minor caveat of the upgrade path
> in
> >> > that
> >> > > there will be a few duplicates, but not too many given that we
> expect
> >> the
> >> > > period of duplicate ownership to be minimal. Although it could be
> >> > addressed
> >> > > as you described, it does add complexity to an
> already-rather-complex
> >> > > migration path. Given that it is a transition state (i.e.,
> migration)
> >> we
> >> > > felt it would be better and sufficient to keep it only as complex
as
> >> it
> >> > > needs to be.
> >> > >
> >> > > On Mon, Feb 20, 2017 at 4:45 PM, Onur Karaman <
> >> > > onurkaraman.apache@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Regarding 1: We won't lose the offset from zookeeper upon
> partition
> >> > > > transfer from OZKCC/MDZKCC to MEZKCC because MEZKCC has
> >> > > > "dual.commit.enabled" set to true as well as "offsets.storage"
set
> >> to
> >> > > > kafka. The combination of these configs results in the consumer
> >> > fetching
> >> > > > offsets from both kafka and zookeeper and just picking the greater
> >> of
> >> > the
> >> > > > two.
> >> > > >
> >> > > > On Mon, Feb 20, 2017 at 4:33 PM, Dong Lin <lindong28@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hey Onur,
> >> > > > >
> >> > > > > Thanks for the well-written KIP! I have two questions below.
> >> > > > >
> >> > > > > 1) In the process of migrating from OZKCCs and MDZKCCs to
> >> MEZKCCs, we
> >> > > > will
> >> > > > > may a mix of OZKCCs, MDZKCCs and MEZKCCs. OZKCC and MDZKCC
will
> >> only
> >> > > > commit
> >> > > > > to zookeeper and MDZKCC will use kafka-based offset storage.
> >> Would we
> >> > > > lose
> >> > > > > offset committed to zookeeper by a MDZKCC if a partition
> >> ownership if
> >> > > > > transferred from a MDZKCC to a MEZKCC?
> >> > > > >
> >> > > > > 2) Suppose every process in the group is running MEZKCC.
Each
> >> MEZKCC
> >> > > has
> >> > > > a
> >> > > > > zookeeper-based partition assignment and kafka-based partition
> >> > > > assignment.
> >> > > > > Is it guaranteed that these two assignments are exactly
the same
> >> > across
> >> > > > > processes? If not, say the zookeeper-based assignment assigns
> p1,
> >> p2
> >> > to
> >> > > > > process 1, and p3 to process 2. And kafka-based assignment
> assigns
> >> > p1,
> >> > > p3
> >> > > > > to process 1, and p2 to process 2. Say process 1 handles
> receives
> >> the
> >> > > > > notification to switch to kafka-based notification before
> process
> >> 2,
> >> > it
> >> > > > is
> >> > > > > possible that during a short period of time p3 will be consumed
> by
> >> > both
> >> > > > > processes?
> >> > > > >
> >> > > > > This period is probably short and I am not sure how many
> messages
> >> may
> >> > > be
> >> > > > > duplicated as a result. But it seems possible to avoid this
> >> > completely
> >> > > > > according to an idea that Becket suggested in a previous
> >> discussion.
> >> > > The
> >> > > > > znode /consumers/<group id>/migration/mode can contain
a
> sequence
> >> > > number
> >> > > > > that increment for each switch. Say the znode is toggled
to
> kafka
> >> > with
> >> > > > > sequence number 2, each MEZKCC will commit offset to with
number
> >> 2 in
> >> > > the
> >> > > > > metadata for partitions that it currently owns according
to the
> >> > > zk-based
> >> > > > > partition assignment, and then periodically fetches the
> committed
> >> > > offset
> >> > > > > and the metadata for the partitions that it should own according
> >> to
> >> > the
> >> > > > > kafka-based partition assignment. Each MEZKCC only starts
> >> consumption
> >> > > > when
> >> > > > > the metadata has incremented to the number 2.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Dong
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Feb 20, 2017 at 12:04 PM, Onur Karaman <
> >> > > > > onurkaraman.apache@gmail.com
> >> > > > > > wrote:
> >> > > > >
> >> > > > > > Hey everyone.
> >> > > > > >
> >> > > > > > I made a KIP that provides a mechanism for migrating
from
> >> > > > > > ZookeeperConsumerConnector to KafkaConsumer as well
as a
> >> mechanism
> >> > > for
> >> > > > > > rolling back from KafkaConsumer to ZookeeperConsumerConnector:
> >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-125%
> >> > > > > > 3A+ZookeeperConsumerConnector+to+KafkaConsumer+Migration+
> >> > > and+Rollback
> >> > > > > >
> >> > > > > > Comments are welcome.
> >> > > > > >
> >> > > > > > - Onur
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message