kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From radai <radai.rosenbl...@gmail.com>
Subject Re: [DISCUSS] KIP-82 - Add Record Headers
Date Tue, 08 Nov 2016 17:48:58 GMT
both 5a and 5c would involve a wire format change, so any arguments about
needing an upgrade path bumping protocol version etc apply equally to both.
so the "cost" (in terms of impact of a wire format change) is the same.

5c, to me, means doing all the work (more exactly incurring all the cost)
but getting very few of the benefits. a universal, agreed-upon structure
for headers (specifically their keys) is, in my opinion, a basic
requirement to reap the full benefits of headers - an active ecosystem of
composable, re-usable, 3rd-party extensions to kafka.

as for what exactly those keys are (int vs string) - since using ints is
such a giant sticking point and given kafka usually operates with batching
and compression and does not achieve high-enough iops for it to make a
noticeable difference in CPU consumption I'm willing to go with string keys
just to get that out of the way.

On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <Michael.Pearce@ig.com>
wrote:

> +1 on this slimmer version of our proposal
>
> I def think the Id space we can reduce from the proposed int32(4bytes)
> down to int16(2bytes) it saves on space and as headers we wouldn't expect
> the number of headers being used concurrently being that high.
>
> I would wonder if we should make the value byte array length still int32
> though as This is the standard Max array length in Java saying that it is a
> header and I guess limiting the size is sensible and would work for all the
> use cases we have in mind so happy with limiting this.
>
> Do people generally concur on Magnus's slimmer version? Anyone see any
> issues if we moved from int32 to int16?
>
> Re configurable ids per plugin over a global registry also would work for
> us.  As such if this has better concensus over the proposed global registry
> I'd be happy to change that.
>
> I was already sold on ints over strings for keys ;)
>
> Cheers
> Mike
>
> ________________________________________
> From: Magnus Edenhill <magnus@edenhill.se>
> Sent: Monday, November 7, 2016 10:10:21 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> Hi,
>
> I'm +1 for adding generic message headers, but I do share the concerns
> previously aired on this thread and during the KIP meeting.
>
> So let me propose a slimmer alternative that does not require any sort of
> global header registry, does not affect broker performance or operations,
> and adds as little overhead as possible.
>
>
> Message
> ------------
> The protocol Message type is extended with a Headers array consting of
> Tags, where a Tag is defined as:
>    int16 Id
>    int16 Len              // binary_data length
>    binary_data[Len]  // opaque binary data
>
>
> Ids
> ---
> The Id space is not centrally managed, so whenever an application needs to
> add headers, or use an eco-system plugin that does, its Id allocation will
> need to be manually configured.
> This moves the allocation concern from the global space down to
> organization level and avoids the risk for id conflicts.
> Example pseudo-config for some app:
>     sometrackerplugin.tag.sourcev3.id=1000
>     dbthing.tag.tablename.id=1001
>     myschemareg.tag.schemaname.id=1002
>     myschemareg.tag.schemaversion.id=1003
>
>
> Each header-writing or header-reading plugin must provide means (typically
> through configuration) to specify the tag for each header it uses. Defaults
> should be avoided.
> A consumer silently ignores tags it does not have a mapping for (since the
> binary_data can't be parsed without knowing what it is).
>
> Id range 0..999 is reserved for future use by the broker and must not be
> used by plugins.
>
>
>
> Broker
> ---------
> The broker does not process the tags (other than the standard protocol
> syntax verification), it simply stores and forwards them as opaque data.
>
> Standard message translation (removal of Headers) kicks in for older
> clients.
>
>
> Why not string ids?
> -------------------------
> String ids might seem like a good idea, but:
>  * does not really solve uniqueness
>  * consumes a lot of space (2 byte string length + string, per header) to
> be meaningful
>  * doesn't really say anything how to parse the tag's data, so it is in
> effect useless on its own.
>
>
> Regards,
> Magnus
>
>
>
>
> 2016-11-07 18:32 GMT+01:00 Michael Pearce <Michael.Pearce@ig.com>:
>
> > Hi Roger,
> >
> > Thanks for the support.
> >
> > I think the key thing is to have a common key space to make an ecosystem,
> > there does have to be some level of contract for people to play nicely.
> >
> > Having map<String, byte[]> or as per current proposed in kip of having a
> > numerical key space of  map<int, byte[]> is a level of the contract that
> > most people would expect.
> >
> > I think the example in a previous comment someone else made linking to
> AWS
> > blog and also implemented api where originally they didn’t have a header
> > space but not they do, where keys are uniform but the value can be
> string,
> > int, anything is a good example.
> >
> > Having a custom MetadataSerializer is something we had played with, but
> > discounted the idea, as if you wanted everyone to work the same way in
> the
> > ecosystem, having to have this also customizable makes it a bit harder.
> > Think about making the whole message record custom serializable, this
> would
> > make it fairly tricky (though it would not be impossible) to have made
> work
> > nicely. Having the value customizable we thought is a reasonable tradeoff
> > here of flexibility over contract of interaction between different
> parties.
> >
> > Is there a particular case or benefit of having serialization
> customizable
> > that you have in mind?
> >
> > Saying this it is obviously something that could be implemented, if there
> > is a need. If we did go this avenue I think a defaulted serializer
> > implementation should exist so for the 80:20 rule, people can just have
> the
> > broker and clients get default behavior.
> >
> > Cheers
> > Mike
> >
> > On 11/6/16, 5:25 PM, "radai" <radai.rosenblatt@gmail.com> wrote:
> >
> >     making header _key_ serialization configurable potentially undermines
> > the
> >     board usefulness of the feature (any point along the path must be
> able
> > to
> >     read the header keys. the values may be whatever and require more
> > intimate
> >     knowledge of the code that produced specific headers, but keys should
> > be
> >     universally readable).
> >
> >     it would also make it hard to write really portable plugins - say i
> > wrote a
> >     large message splitter/combiner - if i rely on key "largeMessage" and
> >     values of the form "1/20" someone who uses (contrived example)
> > Map<Byte[],
> >     Double> wouldnt be able to re-use my code.
> >
> >     not the end of a the world within an organization, but problematic if
> > you
> >     want to enable an ecosystem
> >
> >     On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoover@gmail.com
> >
> > wrote:
> >
> >     >  As others have laid out, I see strong reasons for a common message
> >     > metadata structure for the Kafka ecosystem.  In particular, I've
> > seen that
> >     > even within a single organization, infrastructure teams often own
> the
> >     > message metadata while application teams own the application-level
> > data
> >     > format.  Allowing metadata and content to have different structure
> > and
> >     > evolve separately is very helpful for this.  Also, I think there's
> a
> > lot of
> >     > value to having a common metadata structure shared across the Kafka
> >     > ecosystem so that tools which leverage metadata can more easily be
> > shared
> >     > across organizations and integrated together.
> >     >
> >     > The question is, where does the metadata structure belong?  Here's
> > my take:
> >     >
> >     > We change the Kafka wire and on-disk format to from a (key, value)
> > model to
> >     > a (key, metadata, value) model where all three are byte arrays from
> > the
> >     > brokers point of view.  The primary reason for this is that it
> > provides a
> >     > backward compatible migration path forward.  Producers can start
> > populating
> >     > metadata fields before all consumers understand the metadata
> > structure.
> >     > For people who already have custom envelope structures, they can
> > populate
> >     > their existing structure and the new structure for a while as they
> > make the
> >     > transition.
> >     >
> >     > We could stop there and let the clients plug in a KeySerializer,
> >     > MetadataSerializer, and ValueSerializer but I think it is also be
> > useful to
> >     > have a default MetadataSerializer that implements a key-value model
> > similar
> >     > to AMQP or HTTP headers.  Or we could go even further and
> prescribe a
> >     > Map<String, byte[]> or Map<String, String> data model for headers
> in
> > the
> >     > clients (while still allowing custom serialization of the header
> data
> >     > model).
> >     >
> >     > I think this would address Radai's concerns:
> >     > 1. All client code would not need to be updated to know about the
> >     > container.
> >     > 2. Middleware friendly clients would have a standard header data
> > model to
> >     > work with.
> >     > 3. KIP is required both b/c of broker changes and because of client
> > API
> >     > changes.
> >     >
> >     > Cheers,
> >     >
> >     > Roger
> >     >
> >     >
> >     > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenblatt@gmail.com>
> > wrote:
> >     >
> >     > > my biggest issues with a "standard" wrapper format:
> >     > >
> >     > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> > updated
> >     > to
> >     > > know about the container, because any old naive code trying to
> > directly
> >     > > deserialize its own payload would keel over and die (it needs to
> > know to
> >     > > deserialize a container, and then dig in there for its payload).
> >     > > 2. in order to write middleware-friendly clients that utilize
> such
> > a
> >     > > container one would basically have to write their own
> > producer/consumer
> >     > API
> >     > > on top of the open source kafka one.
> >     > > 3. if you were going to go with a wrapper format you really dont
> > need to
> >     > > bother with a kip (just open source your own client stack from #2
> > above
> >     > so
> >     > > others could stop re-inventing it)
> >     > >
> >     > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
> wushujames@gmail.com>
> >     > wrote:
> >     > >
> >     > > > How exactly would this work? Or maybe that's out of scope for
> > this
> >     > email.
> >     > >
> >     >
> >
> >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> > and any copies of it. Opinions, conclusion (etc) that do not relate to
> the
> > official business of this company shall be understood as neither given
> nor
> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> > registered in England and Wales, company number 04008957) and IG Index
> > Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message