kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: Kafka stream - Internal topic name and schema avro compatibility
Date Wed, 08 Aug 2018 20:33:55 GMT
Hi Cédric,

The suffix is generated when we build the topology in such a way to
guarantee each node/interna-topic/state-store gets a unique name.

Generally speaking, it is unsafe to modify the topology and restart it. We
recommend using the app reset tool whenever you update your topology.

That said, some changes to the topology might be safe, so your mileage may
vary; just be aware that changing the topology in place will potentially
produce corrupt data.

The main example I'd give is if you were to restructure your topology and
you wind up with some other node type, like a "KSTREAM-TRANSFORM-" getting
number 99, then you won't have a problem. The new node will create whatever
internal state/topics are needed with a non-colliding name. But if you
restructured the topology and a *different* key select happened to get
number 99, then you'd have a big problem. Streams would have no idea that
the existing repartition topic was for a different key select; it would
just start using the existing topic. But this means that the repartition
topic would be half one set of data and half another. Clearly, this is not
good.

It sounds to me like this is maybe what happened to you.

We have been discussing various mechanisms by which we could support
modifying the topology in place. Typically, this would involve giving each
operator a semantic name so that the internal names would be related to
what the nodes are doing, not the order in which the nodes are created.

At the very least, we'd like to have some way of detecting that the
topology has changed during a restart and refusing to start up, to protect
the integrity of your data.

I hope this helps,
-John

On Wed, Aug 8, 2018 at 12:51 PM Adam Bellemare <adam.bellemare@gmail.com>
wrote:

> Hi Cédric
>
> I do not know how the topology names are chosen, but provided that you
> didn't change any of the topology then new topics will not be created or
> require alteration.
>
> If you modify the topology then the naming can indeed change, but it would
> then create a new internal topic and there would be no compatibility issue.
> It could very well be that your topology was modified in such a way that
> another, different internal topic is attempting to register an incompatible
> schema. In this case though, I would expect that the error information
> returned from the schema registry registration process to highlight exactly
> what the failure is. It has been a while since we run into one of these so
> I could be wrong on that front though.
>
> My recommendation to you is to create a simple "InternalSerde" for your
> Avro classes used in internal topics, such that you do *not* register them
> to the schema registry. I have found that registering internal topics to
> the schema registry turns it into a garbage dump and prevents developers
> from making independent changes to their internal schemas. The rule of
> thumb we use is that we only register schemas to the schema registry when
> the events leave the application's bounded context - ie: final output
> events only.
>
> Hope this helps,
>
> Adam
>
>
>
>
>
> On Wed, Aug 8, 2018 at 11:14 AM, Cedric BERTRAND <
> bertrandcedric.cbe@gmail.com> wrote:
>
> > Within the Kafka Stream topology, internal topic are created.
> > For this internal topics, schema avro for key and value are registered
> into
> > schema registry.
> >
> > For the topic internal-MYAPPS-KSTREAM-KEY-SELECT-0000000099-repartition,
> I
> > have 2 subjects into schema registry :
> > - internal-MYAPPS-KSTREAM-KEY-SELECT-0000000099-repartition-key
> > - internal-MYAPPS-KSTREAM-KEY-SELECT-0000000099-repartition-value
> >
> > My questions are :
> >
> > How Kafka create the internal topology name (how the suffix number is
> > changed) ?
> >
> > When if I change the processing into the toplogy => change in the DAG ?
> > - If I have a name with 0000000099, do I have the same number after a
> > modification of the topology ?
> > - If not, is Kafka Stream allowed to use an already used number ?
> >
> >
> > I ask this question because I have an incompatible schema on an internal
> > topic and from my point of view, no changes have been made on the schema.
> > The only change is a modification on the topology which change the DAG
> and
> > maybe the name of internal topic.
> >
> >
> > Thanks for your time,
> >
> > Cédric
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message