kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Hoover <roger.hoo...@gmail.com>
Subject Re: [DISCUSS] KIP-82 - Add Record Headers
Date Mon, 07 Nov 2016 22:48:15 GMT
Magnus,

Thanks for jumping in.  Do you see a reason that the broker should
understand the header structure you've proposed? I'm wondering if metadata
should just be bytes from the broker's point of view but clients could
implement a common header serde spec on top.

Cheers,

Roger

On Mon, Nov 7, 2016 at 2:10 PM, Magnus Edenhill <magnus@edenhill.se> wrote:

> Hi,
>
> I'm +1 for adding generic message headers, but I do share the concerns
> previously aired on this thread and during the KIP meeting.
>
> So let me propose a slimmer alternative that does not require any sort of
> global header registry, does not affect broker performance or operations,
> and adds as little overhead as possible.
>
>
> Message
> ------------
> The protocol Message type is extended with a Headers array consting of
> Tags, where a Tag is defined as:
>    int16 Id
>    int16 Len              // binary_data length
>    binary_data[Len]  // opaque binary data
>
>
> Ids
> ---
> The Id space is not centrally managed, so whenever an application needs to
> add headers, or use an eco-system plugin that does, its Id allocation will
> need to be manually configured.
> This moves the allocation concern from the global space down to
> organization level and avoids the risk for id conflicts.
> Example pseudo-config for some app:
>     sometrackerplugin.tag.sourcev3.id=1000
>     dbthing.tag.tablename.id=1001
>     myschemareg.tag.schemaname.id=1002
>     myschemareg.tag.schemaversion.id=1003
>
>
> Each header-writing or header-reading plugin must provide means (typically
> through configuration) to specify the tag for each header it uses. Defaults
> should be avoided.
> A consumer silently ignores tags it does not have a mapping for (since the
> binary_data can't be parsed without knowing what it is).
>
> Id range 0..999 is reserved for future use by the broker and must not be
> used by plugins.
>
>
>
> Broker
> ---------
> The broker does not process the tags (other than the standard protocol
> syntax verification), it simply stores and forwards them as opaque data.
>
> Standard message translation (removal of Headers) kicks in for older
> clients.
>
>
> Why not string ids?
> -------------------------
> String ids might seem like a good idea, but:
>  * does not really solve uniqueness
>  * consumes a lot of space (2 byte string length + string, per header) to
> be meaningful
>  * doesn't really say anything how to parse the tag's data, so it is in
> effect useless on its own.
>
>
> Regards,
> Magnus
>
>
>
>
> 2016-11-07 18:32 GMT+01:00 Michael Pearce <Michael.Pearce@ig.com>:
>
> > Hi Roger,
> >
> > Thanks for the support.
> >
> > I think the key thing is to have a common key space to make an ecosystem,
> > there does have to be some level of contract for people to play nicely.
> >
> > Having map<String, byte[]> or as per current proposed in kip of having a
> > numerical key space of  map<int, byte[]> is a level of the contract that
> > most people would expect.
> >
> > I think the example in a previous comment someone else made linking to
> AWS
> > blog and also implemented api where originally they didn’t have a header
> > space but not they do, where keys are uniform but the value can be
> string,
> > int, anything is a good example.
> >
> > Having a custom MetadataSerializer is something we had played with, but
> > discounted the idea, as if you wanted everyone to work the same way in
> the
> > ecosystem, having to have this also customizable makes it a bit harder.
> > Think about making the whole message record custom serializable, this
> would
> > make it fairly tricky (though it would not be impossible) to have made
> work
> > nicely. Having the value customizable we thought is a reasonable tradeoff
> > here of flexibility over contract of interaction between different
> parties.
> >
> > Is there a particular case or benefit of having serialization
> customizable
> > that you have in mind?
> >
> > Saying this it is obviously something that could be implemented, if there
> > is a need. If we did go this avenue I think a defaulted serializer
> > implementation should exist so for the 80:20 rule, people can just have
> the
> > broker and clients get default behavior.
> >
> > Cheers
> > Mike
> >
> > On 11/6/16, 5:25 PM, "radai" <radai.rosenblatt@gmail.com> wrote:
> >
> >     making header _key_ serialization configurable potentially undermines
> > the
> >     board usefulness of the feature (any point along the path must be
> able
> > to
> >     read the header keys. the values may be whatever and require more
> > intimate
> >     knowledge of the code that produced specific headers, but keys should
> > be
> >     universally readable).
> >
> >     it would also make it hard to write really portable plugins - say i
> > wrote a
> >     large message splitter/combiner - if i rely on key "largeMessage" and
> >     values of the form "1/20" someone who uses (contrived example)
> > Map<Byte[],
> >     Double> wouldnt be able to re-use my code.
> >
> >     not the end of a the world within an organization, but problematic if
> > you
> >     want to enable an ecosystem
> >
> >     On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoover@gmail.com
> >
> > wrote:
> >
> >     >  As others have laid out, I see strong reasons for a common message
> >     > metadata structure for the Kafka ecosystem.  In particular, I've
> > seen that
> >     > even within a single organization, infrastructure teams often own
> the
> >     > message metadata while application teams own the application-level
> > data
> >     > format.  Allowing metadata and content to have different structure
> > and
> >     > evolve separately is very helpful for this.  Also, I think there's
> a
> > lot of
> >     > value to having a common metadata structure shared across the Kafka
> >     > ecosystem so that tools which leverage metadata can more easily be
> > shared
> >     > across organizations and integrated together.
> >     >
> >     > The question is, where does the metadata structure belong?  Here's
> > my take:
> >     >
> >     > We change the Kafka wire and on-disk format to from a (key, value)
> > model to
> >     > a (key, metadata, value) model where all three are byte arrays from
> > the
> >     > brokers point of view.  The primary reason for this is that it
> > provides a
> >     > backward compatible migration path forward.  Producers can start
> > populating
> >     > metadata fields before all consumers understand the metadata
> > structure.
> >     > For people who already have custom envelope structures, they can
> > populate
> >     > their existing structure and the new structure for a while as they
> > make the
> >     > transition.
> >     >
> >     > We could stop there and let the clients plug in a KeySerializer,
> >     > MetadataSerializer, and ValueSerializer but I think it is also be
> > useful to
> >     > have a default MetadataSerializer that implements a key-value model
> > similar
> >     > to AMQP or HTTP headers.  Or we could go even further and
> prescribe a
> >     > Map<String, byte[]> or Map<String, String> data model for headers
> in
> > the
> >     > clients (while still allowing custom serialization of the header
> data
> >     > model).
> >     >
> >     > I think this would address Radai's concerns:
> >     > 1. All client code would not need to be updated to know about the
> >     > container.
> >     > 2. Middleware friendly clients would have a standard header data
> > model to
> >     > work with.
> >     > 3. KIP is required both b/c of broker changes and because of client
> > API
> >     > changes.
> >     >
> >     > Cheers,
> >     >
> >     > Roger
> >     >
> >     >
> >     > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenblatt@gmail.com>
> > wrote:
> >     >
> >     > > my biggest issues with a "standard" wrapper format:
> >     > >
> >     > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> > updated
> >     > to
> >     > > know about the container, because any old naive code trying to
> > directly
> >     > > deserialize its own payload would keel over and die (it needs to
> > know to
> >     > > deserialize a container, and then dig in there for its payload).
> >     > > 2. in order to write middleware-friendly clients that utilize
> such
> > a
> >     > > container one would basically have to write their own
> > producer/consumer
> >     > API
> >     > > on top of the open source kafka one.
> >     > > 3. if you were going to go with a wrapper format you really dont
> > need to
> >     > > bother with a kip (just open source your own client stack from #2
> > above
> >     > so
> >     > > others could stop re-inventing it)
> >     > >
> >     > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
> wushujames@gmail.com>
> >     > wrote:
> >     > >
> >     > > > How exactly would this work? Or maybe that's out of scope for
> > this
> >     > email.
> >     > >
> >     >
> >
> >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> > and any copies of it. Opinions, conclusion (etc) that do not relate to
> the
> > official business of this company shall be understood as neither given
> nor
> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> > registered in England and Wales, company number 04008957) and IG Index
> > Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message