kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean McCauliff <smccaul...@linkedin.com.INVALID>
Subject Re: [DISCUSS] KIP-82 - Add Record Headers
Date Wed, 09 Nov 2016 03:15:14 GMT
A local namespace mapping from namespace ids to ints would definitely solve
the problem of having a global namespace and would make the int header keys
potentially more readable for logging and debugging purposes.  But this
means another (potentially very large) set of configuration parameters that
need to be present on each component that wants to inspect the headers. I'm
sure it will be a fun day to track down that class of misconfiguration.

If the brokers are inspecting the headers then the brokers need this
config.  If the config changes then the brokers need to be restarted which
seems pretty expensive.  Otherwise there now needs to be a new way to
update the broker with this information.

Java itself does not have namespace collisions often and there is not a
central registration of namespaces. The set of Kafka infrastructure
engineers is much smaller than that namespace.  Having reasonable names
should allow every header user to peacefully coexist.

--
Sean McCauliff
Staff Software Engineer
Kafka

smccauliff@linkedin.com
linkedin.com/in/sean-mccauliff-b563192

On Mon, Nov 7, 2016 at 2:10 PM, Magnus Edenhill <magnus@edenhill.se> wrote:

> Hi,
>
> I'm +1 for adding generic message headers, but I do share the concerns
> previously aired on this thread and during the KIP meeting.
>
> So let me propose a slimmer alternative that does not require any sort of
> global header registry, does not affect broker performance or operations,
> and adds as little overhead as possible.
>
>
> Message
> ------------
> The protocol Message type is extended with a Headers array consting of
> Tags, where a Tag is defined as:
>    int16 Id
>    int16 Len              // binary_data length
>    binary_data[Len]  // opaque binary data
>
>
> Ids
> ---
> The Id space is not centrally managed, so whenever an application needs to
> add headers, or use an eco-system plugin that does, its Id allocation will
> need to be manually configured.
> This moves the allocation concern from the global space down to
> organization level and avoids the risk for id conflicts.
> Example pseudo-config for some app:
>     sometrackerplugin.tag.sourcev3.id=1000
>     dbthing.tag.tablename.id=1001
>     myschemareg.tag.schemaname.id=1002
>     myschemareg.tag.schemaversion.id=1003
>
>
> Each header-writing or header-reading plugin must provide means (typically
> through configuration) to specify the tag for each header it uses. Defaults
> should be avoided.
> A consumer silently ignores tags it does not have a mapping for (since the
> binary_data can't be parsed without knowing what it is).
>
> Id range 0..999 is reserved for future use by the broker and must not be
> used by plugins.
>
>
>
> Broker
> ---------
> The broker does not process the tags (other than the standard protocol
> syntax verification), it simply stores and forwards them as opaque data.
>
> Standard message translation (removal of Headers) kicks in for older
> clients.
>
>
> Why not string ids?
> -------------------------
> String ids might seem like a good idea, but:
>  * does not really solve uniqueness
>  * consumes a lot of space (2 byte string length + string, per header) to
> be meaningful
>  * doesn't really say anything how to parse the tag's data, so it is in
> effect useless on its own.
>
>
> Regards,
> Magnus
>
>
>
>
> 2016-11-07 18:32 GMT+01:00 Michael Pearce <Michael.Pearce@ig.com>:
>
> > Hi Roger,
> >
> > Thanks for the support.
> >
> > I think the key thing is to have a common key space to make an ecosystem,
> > there does have to be some level of contract for people to play nicely.
> >
> > Having map<String, byte[]> or as per current proposed in kip of having a
> > numerical key space of  map<int, byte[]> is a level of the contract that
> > most people would expect.
> >
> > I think the example in a previous comment someone else made linking to
> AWS
> > blog and also implemented api where originally they didn’t have a header
> > space but not they do, where keys are uniform but the value can be
> string,
> > int, anything is a good example.
> >
> > Having a custom MetadataSerializer is something we had played with, but
> > discounted the idea, as if you wanted everyone to work the same way in
> the
> > ecosystem, having to have this also customizable makes it a bit harder.
> > Think about making the whole message record custom serializable, this
> would
> > make it fairly tricky (though it would not be impossible) to have made
> work
> > nicely. Having the value customizable we thought is a reasonable tradeoff
> > here of flexibility over contract of interaction between different
> parties.
> >
> > Is there a particular case or benefit of having serialization
> customizable
> > that you have in mind?
> >
> > Saying this it is obviously something that could be implemented, if there
> > is a need. If we did go this avenue I think a defaulted serializer
> > implementation should exist so for the 80:20 rule, people can just have
> the
> > broker and clients get default behavior.
> >
> > Cheers
> > Mike
> >
> > On 11/6/16, 5:25 PM, "radai" <radai.rosenblatt@gmail.com> wrote:
> >
> >     making header _key_ serialization configurable potentially undermines
> > the
> >     board usefulness of the feature (any point along the path must be
> able
> > to
> >     read the header keys. the values may be whatever and require more
> > intimate
> >     knowledge of the code that produced specific headers, but keys should
> > be
> >     universally readable).
> >
> >     it would also make it hard to write really portable plugins - say i
> > wrote a
> >     large message splitter/combiner - if i rely on key "largeMessage" and
> >     values of the form "1/20" someone who uses (contrived example)
> > Map<Byte[],
> >     Double> wouldnt be able to re-use my code.
> >
> >     not the end of a the world within an organization, but problematic if
> > you
> >     want to enable an ecosystem
> >
> >     On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoover@gmail.com
> >
> > wrote:
> >
> >     >  As others have laid out, I see strong reasons for a common message
> >     > metadata structure for the Kafka ecosystem.  In particular, I've
> > seen that
> >     > even within a single organization, infrastructure teams often own
> the
> >     > message metadata while application teams own the application-level
> > data
> >     > format.  Allowing metadata and content to have different structure
> > and
> >     > evolve separately is very helpful for this.  Also, I think there's
> a
> > lot of
> >     > value to having a common metadata structure shared across the Kafka
> >     > ecosystem so that tools which leverage metadata can more easily be
> > shared
> >     > across organizations and integrated together.
> >     >
> >     > The question is, where does the metadata structure belong?  Here's
> > my take:
> >     >
> >     > We change the Kafka wire and on-disk format to from a (key, value)
> > model to
> >     > a (key, metadata, value) model where all three are byte arrays from
> > the
> >     > brokers point of view.  The primary reason for this is that it
> > provides a
> >     > backward compatible migration path forward.  Producers can start
> > populating
> >     > metadata fields before all consumers understand the metadata
> > structure.
> >     > For people who already have custom envelope structures, they can
> > populate
> >     > their existing structure and the new structure for a while as they
> > make the
> >     > transition.
> >     >
> >     > We could stop there and let the clients plug in a KeySerializer,
> >     > MetadataSerializer, and ValueSerializer but I think it is also be
> > useful to
> >     > have a default MetadataSerializer that implements a key-value model
> > similar
> >     > to AMQP or HTTP headers.  Or we could go even further and
> prescribe a
> >     > Map<String, byte[]> or Map<String, String> data model for headers
> in
> > the
> >     > clients (while still allowing custom serialization of the header
> data
> >     > model).
> >     >
> >     > I think this would address Radai's concerns:
> >     > 1. All client code would not need to be updated to know about the
> >     > container.
> >     > 2. Middleware friendly clients would have a standard header data
> > model to
> >     > work with.
> >     > 3. KIP is required both b/c of broker changes and because of client
> > API
> >     > changes.
> >     >
> >     > Cheers,
> >     >
> >     > Roger
> >     >
> >     >
> >     > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenblatt@gmail.com>
> > wrote:
> >     >
> >     > > my biggest issues with a "standard" wrapper format:
> >     > >
> >     > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> > updated
> >     > to
> >     > > know about the container, because any old naive code trying to
> > directly
> >     > > deserialize its own payload would keel over and die (it needs to
> > know to
> >     > > deserialize a container, and then dig in there for its payload).
> >     > > 2. in order to write middleware-friendly clients that utilize
> such
> > a
> >     > > container one would basically have to write their own
> > producer/consumer
> >     > API
> >     > > on top of the open source kafka one.
> >     > > 3. if you were going to go with a wrapper format you really dont
> > need to
> >     > > bother with a kip (just open source your own client stack from #2
> > above
> >     > so
> >     > > others could stop re-inventing it)
> >     > >
> >     > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
> wushujames@gmail.com>
> >     > wrote:
> >     > >
> >     > > > How exactly would this work? Or maybe that's out of scope for
> > this
> >     > email.
> >     > >
> >     >
> >
> >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> > and any copies of it. Opinions, conclusion (etc) that do not relate to
> the
> > official business of this company shall be understood as neither given
> nor
> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> > registered in England and Wales, company number 04008957) and IG Index
> > Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message