kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantine Karantasis <konstant...@confluent.io>
Subject Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect
Date Tue, 29 Jan 2019 01:22:55 GMT
Hi Ismael,
thanks for bringing up serialization in the discussion!

Indeed, JSON was considered given it's the prevalent text-based
serialization option.

In comparison to flatbuffers, most generic pros and cons are valid in this
context too. Higher perfomance during serde, small size, optional fields,
strongly typed and others.

Specifically, for Connect's use case, flatbuffers serialization, although
it introduces a single dependency, it appears more appealing for the
following reasons:

* The protocol is evolving from a binary format again to a binary one.
* Although new fields, nested or not, are expected to be introduced (as in
KIP-415) or old fields may get deprecated, the protocol schemas are
expected to be simple, mostly flat and manageable. We won't need to process
arbitrarily nested structures during runtime, for which JSON would be a
better fit. The current proposal aims to make the current append only
format a bit more flexible.
* It's good to keep performance tight because the loop that includes
subprotocol serde will need to accomodate resource release and assignment.
Also, rebalancing in it's incremental cooperative form which is expected to
be lighter has the potential to start happening more frequently. Parsing
JSON with Jackson has been a hotspot in certain occasions in the past if I
remember correctly.
* Evolution will be facilitated by handling or ignoring optional fields
easily. The protocol may evolve with fewer hard version bumps like the one
proposed here from V0 to V1.
* Optional fields are omitted, not just compressed.
* Unpacking of fields does not require deserialization of the whole
message, making transition between versions or flavors of the protocol easy
and performant.
* Flatbuffers' specification is simple and can be implemented, even in the
absence of appropriate clients.

I hope the above highlight why flatbuffers is a good candidate for this use
case and, thus, worth adding as a dependency.
Strictly speaking, yes, they introduce a new compile-time dependency. But
during runtime, such a dependency seems equivalent to introducing a JSON
parser (such as Jackson that is already being used in AK).

Your question is very valid. It's probably worth adding an item under
rejected alternatives, once we agree how we want to move forward.


On Fri, Jan 25, 2019 at 11:13 PM Ismael Juma <ismaelj@gmail.com> wrote:

> Thanks for the KIP Konstantine. Quick question: introducing a new
> serialization format (ie flatbuffers) has major implications. Have we
> considered json? If so, why did we reject it?
> Ismael
> On Fri, Jan 11, 2019, 3:44 PM Konstantine Karantasis <
> konstantine@confluent.io wrote:
> > Hi all,
> >
> > I just published KIP-415: Incremental Cooperative Rebalancing in Kafka
> > Connect
> > on the wiki here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >
> > This is the first KIP to suggest an implementation of incremental and
> > cooperative rebalancing in the context of Kafka Connect. It aims to
> provide
> > an adequate solution to the stop-the-world effect that occurs in a
> Connect
> > cluster whenever a new connector configuration is submitted or a Connect
> > Worker is added or removed from the cluster.
> >
> > Looking forward to your insightful feedback!
> >
> > Regards,
> > Konstantine
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message