kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From radai <radai.rosenbl...@gmail.com>
Subject Re: [DISCUSS] KIP-82 - Add Record Headers
Date Wed, 09 Nov 2016 14:43:45 GMT
@magnus - and very dangerous (youre essentially downloading and executing
arbitrary code off the internet on your servers ... bad idea without a
sandbox, even with)

as for it being a purely administrative task - i disagree.

i wish it would, really, because then my earlier point on the complexity of
the remapping process would be invalid, but at linkedin, for example, we
(the team im in) run kafka as a service. we dont really know what our users
(developing applications that use kafka) are up to at any given moment. it
is very possible (given the existance of headers and a corresponding plugin
ecosystem) for some application to "equip" their producers and consumers
with the required plugin without us knowing. i dont mean to imply thats
bad, i just want to make the point that its not as simple keeping it in
sync across a large-enough organization.


On Wed, Nov 9, 2016 at 6:17 AM, Magnus Edenhill <magnus@edenhill.se> wrote:

> I think there is a piece missing in the Strings discussion, where
> pro-Stringers
> reason that by providing unique string identifiers for each header
> everything will just
> magically work for all parts of the stream pipeline.
>
> But the strings dont mean anything by themselves, and while we could
> probably envision
> some auto plugin loader that downloads, compiles, links and runs plugins
> on-demand
> as soon as they're seen by a consumer, I dont really see a use-case for
> something
> so dynamic (and fragile) in practice.
>
> In the real world an application will be configured with a set of plugins
> to either add (producer)
> or read (consumer) headers.
> This is an administrative task based on what features a client
> needs/provides and results in
> some sort of configuration to enable and configure the desired plugins.
>
> Since this needs to be kept somewhat in sync across an organisation (there
> is no point in having producers
> add headers no consumers will read, and vice versa), the added complexity
> of assigning an id namespace
> for each plugin as it is being configured should be tolerable.
>
>
> /Magnus
>
> 2016-11-09 13:06 GMT+01:00 Michael Pearce <Michael.Pearce@ig.com>:
>
> > Just following/catching up on what seems to be an active night :)
> >
> > @Radai sorry if it may seem obvious but what does MD stand for?
> >
> > My take on String vs Int:
> >
> > I will state first I am pro Int (16 or 32).
> >
> > I do though playing devils advocate see a big plus with the argument of
> > String keys, this is around integrating into an existing eco-system.
> >
> > As many other systems use String based headers (Flume, JMS)  it makes it
> > much easier for these to be incorporated/integrated into.
> >
> > How with Int based headers could we provide a way/guidence to make this
> > integration simple / easy with transition flows over to kafka?
> >
> > * tough luck buddy you're on your own
> > * simply hash the string into int code and hope for no collisions (how to
> > convert back though?)
> > * http2 style as mentioned by nacho.
> >
> > cheers,
> > Mike
> >
> >
> > ________________________________________
> > From: radai <radai.rosenblatt@gmail.com>
> > Sent: Wednesday, November 9, 2016 8:12 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >
> > thinking about it some more, the best way to transmit the header
> remapping
> > data to consumers would be to put it in the MD response payload, so maybe
> > it should be discussed now.
> >
> >
> > On Wed, Nov 9, 2016 at 12:09 AM, radai <radai.rosenblatt@gmail.com>
> wrote:
> >
> > > im not opposed to the idea of namespace mapping. all im saying is that
> > its
> > > not part of the "mvp" and, since it requires no wire format change, can
> > > always be added later.
> > > also, its not as simple as just configuring MM to do the transform:
> lets
> > > say i've implemented large message support as {666,1} and on some
> mirror
> > > target cluster its been remapped to {999,1}. the consumer plugin code
> > would
> > > also need to be told to look for the large message "part X of Y" header
> > > under {999,1}. doable, but tricky.
> > >
> > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen Shapira <gwen@confluent.io>
> wrote:
> > >
> > >> While you can do whatever you want with a namespace and your code,
> > >> what I'd expect is for each app to namespaces configurable...
> > >>
> > >> So if I accidentally used 666 for my HR department, and still want to
> > >> run RadaiApp, I can config "namespace=42" for RadaiApp and everything
> > >> will look normal.
> > >>
> > >> This means you only need to sync usage inside your own organization.
> > >> Still hard, but somewhat easier than syncing with the entire world.
> > >>
> > >> On Tue, Nov 8, 2016 at 10:07 PM, radai <radai.rosenblatt@gmail.com>
> > >> wrote:
> > >> > and we can start with {namespace, id} and no re-mapping support and
> > >> always
> > >> > add it later on if/when collisions actually happen (i dont think
> > they'd
> > >> be
> > >> > a problem).
> > >> >
> > >> > every interested party (so orgs or individuals) could then register
> a
> > >> > prefix (0 = reserved, 1 = confluent ... 666 = me :-) ) and do
> whatever
> > >> with
> > >> > the 2nd ID - so once linkedin registers, say 3, then linkedin devs
> are
> > >> free
> > >> > to use {3, *} with a reasonable expectation to to collide with
> > anything
> > >> > else. further partitioning of that * becomes linkedin's problem, but
> > the
> > >> > "upstream registration" of a namespace only has to happen once.
> > >> >
> > >> > On Tue, Nov 8, 2016 at 9:03 PM, James Cheng <wushujames@gmail.com>
> > >> wrote:
> > >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen Shapira <gwen@confluent.io>
> > wrote:
> > >> >> >
> > >> >> > Thank you so much for this clear and fair summary of the
> arguments.
> > >> >> >
> > >> >> > I'm in favor of ints. Not a deal-breaker, but in favor.
> > >> >> >
> > >> >> > Even more in favor of Magnus's decentralized suggestion with
> > Roger's
> > >> >> > tweak: add a namespace for headers. This will allow each app to
> > just
> > >> >> > use whatever IDs it wants internally, and then let the admin
> > >> deploying
> > >> >> > the app figure out an available namespace ID for the app to live
> > in.
> > >> >> > So io.confluent.schema-registry can be namespace 0x01 on my
> > >> deployment
> > >> >> > and 0x57 on yours, and the poor guys developing the app don't
> need
> > to
> > >> >> > worry about that.
> > >> >> >
> > >> >>
> > >> >> Gwen, if I understand your example right, an application deployer
> > might
> > >> >> decide to use 0x01 in one deployment, and that means that once the
> > >> message
> > >> >> is written into the broker, it will be saved on the broker with
> that
> > >> >> specific namespace (0x01).
> > >> >>
> > >> >> If you were to mirror that message into another cluster, the 0x01
> > would
> > >> >> accompany the message, right? What if the deployers of the same app
> > in
> > >> the
> > >> >> other cluster uses 0x57? They won't understand each other?
> > >> >>
> > >> >> I'm not sure that's an avoidable problem. I think it simply means
> > that
> > >> in
> > >> >> order to share data, you have to also have a shared (agreed upon)
> > >> >> understanding of what the namespaces mean. Which I think makes
> sense,
> > >> >> because the alternate (sharing *nothing* at all) would mean that
> > there
> > >> >> would be no way to understand each other.
> > >> >>
> > >> >> -James
> > >> >>
> > >> >> > Gwen
> > >> >> >
> > >> >> > On Tue, Nov 8, 2016 at 4:23 PM, radai <
> radai.rosenblatt@gmail.com>
> > >> >> wrote:
> > >> >> >> +1 for sean's document. it covers pretty much all the trade-offs
> > and
> > >> >> >> provides concrete figures to argue about :-)
> > >> >> >> (nit-picking - used the same xkcd twice, also trove has been
> > >> superceded
> > >> >> for
> > >> >> >> purposes of high performance collections: look at
> > >> >> >> https://github.com/leventov/Koloboke)
> > >> >> >>
> > >> >> >> so to sum up the string vs int debate:
> > >> >> >>
> > >> >> >> performance - you can do 140k ops/sec _per thread_ with string
> > >> headers.
> > >> >> you
> > >> >> >> could do x2-3 better with ints. there's no arguing the relative
> > diff
> > >> >> >> between the two, there's only the question of whether or not
> _the
> > >> rest
> > >> >> of
> > >> >> >> kafka_ operates fast enough to care. if we want to make choices
> > >> solely
> > >> >> >> based on performance we need ints. if we are willing to
> > >> >> settle/compromise
> > >> >> >> for a nicer (to some) API than strings are good enough for the
> > >> current
> > >> >> >> state of affairs.
> > >> >> >>
> > >> >> >> message size - with batching and compression it comes down to a
> > ~5%
> > >> >> >> difference (internal testing, not in the doc. maybe would help
> > >> adding if
> > >> >> >> this becomes a point of contention?). this means it wont really
> > >> affect
> > >> >> >> kafka in "throughput mode" (large, compressed batches). in "low
> > >> latency"
> > >> >> >> mode (meaning less/no batching and compression) the difference
> can
> > >> be
> > >> >> >> extreme (it'll easily be an order of magnitude with small
> payloads
> > >> like
> > >> >> >> stock ticks and header keys of the form
> > >> >> >> "com.acme.infraTeam.kafka.hiMom.auditPlugin"). we have a few
> such
> > >> >> topics at
> > >> >> >> linkedin where actual payloads are ~2 ints and are eclipsed by
> our
> > >> >> in-house
> > >> >> >> audit "header" which is why we liked ints to begin with.
> > >> >> >>
> > >> >> >> "ease of use" - strings would probably still require _some_
> degree
> > >> of
> > >> >> >> partitioning by convention (imagine if everyone used the key
> > >> "infra"...)
> > >> >> >> but its very intuitive for java devs to do anyway
> (reverse-domain
> > is
> > >> >> >> ingrained into java developers at a young age :-) ). also most
> > java
> > >> devs
> > >> >> >> find Map<String, whatever> more intuitive than Map<Integer,
> > >> whatever> -
> > >> >> >> probably because of other text-based protocols like http. ints
> > would
> > >> >> >> require a number registry. if you think number registries are
> hard
> > >> just
> > >> >> >> look at the wiki page for KIPs (specifically the number for next
> > >> >> available
> > >> >> >> KIP) and think again - we are probably talking about the same
> > >> volume of
> > >> >> >> requests. also this would only be "required" (good citizenship,
> > more
> > >> >> like)
> > >> >> >> if you want to publish your plugin for others to use. within
> your
> > >> org do
> > >> >> >> whatever you want - just know that if you use [some "reserved"
> > >> range]
> > >> >> and a
> > >> >> >> future kafka update breaks it its your problem. RTFM.
> > >> >> >>
> > >> >> >> personally im in favor of ints.
> > >> >> >>
> > >> >> >> having said that (and like nacho) I will settle if int vs string
> > >> remains
> > >> >> >> the only obstacle to this.
> > >> >> >>
> > >> >> >> On Tue, Nov 8, 2016 at 3:53 PM, Nacho Solis
> > >> <nsolis@linkedin.com.invalid
> > >> >> >
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >>> I think it's well known I've been pushing for ints (and I could
> > >> switch
> > >> >> to
> > >> >> >>> 16 bit shorts if pressed).
> > >> >> >>>
> > >> >> >>> - efficient (space)
> > >> >> >>> - efficient (processing)
> > >> >> >>> - easily partitionable
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> However, if the only thing that is keeping us from adopting
> > >> headers is
> > >> >> the
> > >> >> >>> use of strings vs ints as keys, then I would cave in and accept
> > >> >> strings. If
> > >> >> >>> we do so, I would like to limit string keys to 128 bytes in
> > length.
> > >> >> This
> > >> >> >>> way 1) I could use a 3 letter string if I wanted (effectively
> > >> using 4
> > >> >> total
> > >> >> >>> bytes), 2) limit overall impact of possible keys (don't really
> > want
> > >> >> people
> > >> >> >>> to send a 16K header string key).
> > >> >> >>>
> > >> >> >>> Nacho
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> On Tue, Nov 8, 2016 at 3:35 PM, Gwen Shapira <
> gwen@confluent.io>
> > >> >> wrote:
> > >> >> >>>
> > >> >> >>>> Forgot to mention: Thank you for quantifying the trade-off -
> it
> > is
> > >> >> >>>> helpful and important regardless of what we end up deciding.
> > >> >> >>>>
> > >> >> >>>> On Tue, Nov 8, 2016 at 3:12 PM, Sean McCauliff
> > >> >> >>>> <smccauliff@linkedin.com.invalid> wrote:
> > >> >> >>>>> On Tue, Nov 8, 2016 at 2:15 PM, Gwen Shapira <
> > gwen@confluent.io>
> > >> >> >>> wrote:
> > >> >> >>>>>
> > >> >> >>>>>> Since Kafka specifically targets high-throughput,
> low-latency
> > >> >> >>>>>> use-cases, I don't think we should trade them off that
> easily.
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>> I find these kind of design goals not to be really helpful
> > unless
> > >> >> it's
> > >> >> >>>>> quantified in someway.  Because it's always possible to argue
> > >> against
> > >> >> >>>>> something as either being not performant or just an
> > >> implementation
> > >> >> >>>> detail.
> > >> >> >>>>>
> > >> >> >>>>> This is a single threaded benchmarks so all the measurements
> > are
> > >> per
> > >> >> >>>>> thread.
> > >> >> >>>>>
> > >> >> >>>>> For 1M messages/s/thread  if header keys are int and you had
> > >> even a
> > >> >> >>>> single
> > >> >> >>>>> header key, value pair then it's still about 2^-2
> microseconds
> > >> which
> > >> >> >>>> means
> > >> >> >>>>> you only have another 0.75 microseconds to do everything else
> > you
> > >> >> want
> > >> >> >>> to
> > >> >> >>>>> do with a message (1M messages/s means 1 micro second per
> > >> message).
> > >> >> >>> With
> > >> >> >>>>> string header keys there is still 0.5 micro seconds to
> process
> > a
> > >> >> >>> message.
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>> I love strings as much as the next guy (we had them in
> Flume),
> > >> but I
> > >> >> >>>>>> was convinced by Magnus/Michael/Radai that strings don't
> > >> actually
> > >> >> have
> > >> >> >>>>>> strong benefits as opposed to ints (you'll need a string
> > >> registry
> > >> >> >>>>>> anyway - otherwise, how will you know what does the
> > "profile_id"
> > >> >> >>>>>> header refers to?) and I want to keep closer to our original
> > >> design
> > >> >> >>>>>> goals for Kafka.
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>> "confluent.profile_id"
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> If someone likes strings in the headers and doesn't do
> > millions
> > >> of
> > >> >> >>>>>> messages a sec, they probably have lots of other systems
> they
> > >> can
> > >> >> use
> > >> >> >>>>>> instead.
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>> None of them will scale like Kafka.  Horizontal scaling is
> > still
> > >> >> good.
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> On Tue, Nov 8, 2016 at 1:22 PM, Sean McCauliff
> > >> >> >>>>>> <smccauliff@linkedin.com.invalid> wrote:
> > >> >> >>>>>>> +1 for String keys.
> > >> >> >>>>>>>
> > >> >> >>>>>>> I've been doing some bechmarking and it seems like the
> > speedup
> > >> for
> > >> >> >>>> using
> > >> >> >>>>>>> integer keys is about 2-5 depending on the length of the
> > >> strings
> > >> >> and
> > >> >> >>>> what
> > >> >> >>>>>>> collections are being used.  The overall amount of time
> spent
> > >> >> >>> parsing
> > >> >> >>>> a
> > >> >> >>>>>> set
> > >> >> >>>>>>> of header key, value pairs probably does not matter unless
> > you
> > >> are
> > >> >> >>>>>> getting
> > >> >> >>>>>>> close to 1M messages per consumer.  In which case probably
> > >> don't
> > >> >> use
> > >> >> >>>>>>> headers.  There is also the option to use very short
> strings;
> > >> some
> > >> >> >>>> that
> > >> >> >>>>>> are
> > >> >> >>>>>>> even shorter than integers.
> > >> >> >>>>>>>
> > >> >> >>>>>>> Partitioning the string key space will be easier than
> > >> partitioning
> > >> >> >>> an
> > >> >> >>>>>>> integer key space. We won't need a global registry.  Kafka
> > >> >> >>> internally
> > >> >> >>>> can
> > >> >> >>>>>>> reserve some prefix like "_" as its namespace.  Everyone
> else
> > >> can
> > >> >> >>> use
> > >> >> >>>>>> their
> > >> >> >>>>>>> company or project name as namespace prefix and life should
> > be
> > >> >> good.
> > >> >> >>>>>>>
> > >> >> >>>>>>> Here's the link to some of the benchmarking info:
> > >> >> >>>>>>> https://docs.google.com/document/d/1tfT-
> > >> >> >>>> 6SZdnKOLyWGDH82kS30PnUkmgb7nPL
> > >> >> >>>>>> dw6p65pAI/edit?usp=sharing
> > >> >> >>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>> --
> > >> >> >>>>>>> Sean McCauliff
> > >> >> >>>>>>> Staff Software Engineer
> > >> >> >>>>>>> Kafka
> > >> >> >>>>>>>
> > >> >> >>>>>>> smccauliff@linkedin.com
> > >> >> >>>>>>> linkedin.com/in/sean-mccauliff-b563192
> > >> >> >>>>>>>
> > >> >> >>>>>>> On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <
> > >> >> >>>> Michael.Pearce@ig.com>
> > >> >> >>>>>>> wrote:
> > >> >> >>>>>>>
> > >> >> >>>>>>>> +1 on this slimmer version of our proposal
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> I def think the Id space we can reduce from the proposed
> > >> >> >>>> int32(4bytes)
> > >> >> >>>>>>>> down to int16(2bytes) it saves on space and as headers we
> > >> wouldn't
> > >> >> >>>>>> expect
> > >> >> >>>>>>>> the number of headers being used concurrently being that
> > high.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> I would wonder if we should make the value byte array
> length
> > >> still
> > >> >> >>>> int32
> > >> >> >>>>>>>> though as This is the standard Max array length in Java
> > saying
> > >> >> that
> > >> >> >>>> it
> > >> >> >>>>>> is a
> > >> >> >>>>>>>> header and I guess limiting the size is sensible and would
> > >> work
> > >> >> for
> > >> >> >>>> all
> > >> >> >>>>>> the
> > >> >> >>>>>>>> use cases we have in mind so happy with limiting this.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Do people generally concur on Magnus's slimmer version?
> > >> Anyone see
> > >> >> >>>> any
> > >> >> >>>>>>>> issues if we moved from int32 to int16?
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Re configurable ids per plugin over a global registry also
> > >> would
> > >> >> >>> work
> > >> >> >>>>>> for
> > >> >> >>>>>>>> us.  As such if this has better concensus over the
> proposed
> > >> global
> > >> >> >>>>>> registry
> > >> >> >>>>>>>> I'd be happy to change that.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> I was already sold on ints over strings for keys ;)
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Cheers
> > >> >> >>>>>>>> Mike
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> ________________________________________
> > >> >> >>>>>>>> From: Magnus Edenhill <magnus@edenhill.se>
> > >> >> >>>>>>>> Sent: Monday, November 7, 2016 10:10:21 PM
> > >> >> >>>>>>>> To: dev@kafka.apache.org
> > >> >> >>>>>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Hi,
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> I'm +1 for adding generic message headers, but I do share
> > the
> > >> >> >>>> concerns
> > >> >> >>>>>>>> previously aired on this thread and during the KIP
> meeting.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> So let me propose a slimmer alternative that does not
> > require
> > >> any
> > >> >> >>>> sort
> > >> >> >>>>>> of
> > >> >> >>>>>>>> global header registry, does not affect broker performance
> > or
> > >> >> >>>>>> operations,
> > >> >> >>>>>>>> and adds as little overhead as possible.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Message
> > >> >> >>>>>>>> ------------
> > >> >> >>>>>>>> The protocol Message type is extended with a Headers array
> > >> >> consting
> > >> >> >>>> of
> > >> >> >>>>>>>> Tags, where a Tag is defined as:
> > >> >> >>>>>>>>   int16 Id
> > >> >> >>>>>>>>   int16 Len              // binary_data length
> > >> >> >>>>>>>>   binary_data[Len]  // opaque binary data
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Ids
> > >> >> >>>>>>>> ---
> > >> >> >>>>>>>> The Id space is not centrally managed, so whenever an
> > >> application
> > >> >> >>>> needs
> > >> >> >>>>>> to
> > >> >> >>>>>>>> add headers, or use an eco-system plugin that does, its Id
> > >> >> >>> allocation
> > >> >> >>>>>> will
> > >> >> >>>>>>>> need to be manually configured.
> > >> >> >>>>>>>> This moves the allocation concern from the global space
> down
> > >> to
> > >> >> >>>>>>>> organization level and avoids the risk for id conflicts.
> > >> >> >>>>>>>> Example pseudo-config for some app:
> > >> >> >>>>>>>>    sometrackerplugin.tag.sourcev3.id=1000
> > >> >> >>>>>>>>    dbthing.tag.tablename.id=1001
> > >> >> >>>>>>>>    myschemareg.tag.schemaname.id=1002
> > >> >> >>>>>>>>    myschemareg.tag.schemaversion.id=1003
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Each header-writing or header-reading plugin must provide
> > >> means
> > >> >> >>>>>> (typically
> > >> >> >>>>>>>> through configuration) to specify the tag for each header
> it
> > >> uses.
> > >> >> >>>>>> Defaults
> > >> >> >>>>>>>> should be avoided.
> > >> >> >>>>>>>> A consumer silently ignores tags it does not have a
> mapping
> > >> for
> > >> >> >>>> (since
> > >> >> >>>>>> the
> > >> >> >>>>>>>> binary_data can't be parsed without knowing what it is).
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Id range 0..999 is reserved for future use by the broker
> and
> > >> must
> > >> >> >>>> not be
> > >> >> >>>>>>>> used by plugins.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Broker
> > >> >> >>>>>>>> ---------
> > >> >> >>>>>>>> The broker does not process the tags (other than the
> > standard
> > >> >> >>>> protocol
> > >> >> >>>>>>>> syntax verification), it simply stores and forwards them
> as
> > >> opaque
> > >> >> >>>> data.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Standard message translation (removal of Headers) kicks in
> > for
> > >> >> >>> older
> > >> >> >>>>>>>> clients.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Why not string ids?
> > >> >> >>>>>>>> -------------------------
> > >> >> >>>>>>>> String ids might seem like a good idea, but:
> > >> >> >>>>>>>> * does not really solve uniqueness
> > >> >> >>>>>>>> * consumes a lot of space (2 byte string length + string,
> > per
> > >> >> >>>> header)
> > >> >> >>>>>> to
> > >> >> >>>>>>>> be meaningful
> > >> >> >>>>>>>> * doesn't really say anything how to parse the tag's data,
> > so
> > >> it
> > >> >> >>> is
> > >> >> >>>> in
> > >> >> >>>>>>>> effect useless on its own.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Regards,
> > >> >> >>>>>>>> Magnus
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> 2016-11-07 18:32 GMT+01:00 Michael Pearce <
> > >> Michael.Pearce@ig.com
> > >> >> >:
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>> Hi Roger,
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Thanks for the support.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> I think the key thing is to have a common key space to
> make
> > >> an
> > >> >> >>>>>> ecosystem,
> > >> >> >>>>>>>>> there does have to be some level of contract for people
> to
> > >> play
> > >> >> >>>>>> nicely.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Having map<String, byte[]> or as per current proposed in
> > kip
> > >> of
> > >> >> >>>>>> having a
> > >> >> >>>>>>>>> numerical key space of  map<int, byte[]> is a level of
> the
> > >> >> >>> contract
> > >> >> >>>>>> that
> > >> >> >>>>>>>>> most people would expect.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> I think the example in a previous comment someone else
> made
> > >> >> >>>> linking to
> > >> >> >>>>>>>> AWS
> > >> >> >>>>>>>>> blog and also implemented api where originally they
> didn’t
> > >> have a
> > >> >> >>>>>> header
> > >> >> >>>>>>>>> space but not they do, where keys are uniform but the
> value
> > >> can
> > >> >> >>> be
> > >> >> >>>>>>>> string,
> > >> >> >>>>>>>>> int, anything is a good example.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Having a custom MetadataSerializer is something we had
> > played
> > >> >> >>> with,
> > >> >> >>>>>> but
> > >> >> >>>>>>>>> discounted the idea, as if you wanted everyone to work
> the
> > >> same
> > >> >> >>>> way in
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>> ecosystem, having to have this also customizable makes
> it a
> > >> bit
> > >> >> >>>>>> harder.
> > >> >> >>>>>>>>> Think about making the whole message record custom
> > >> serializable,
> > >> >> >>>> this
> > >> >> >>>>>>>> would
> > >> >> >>>>>>>>> make it fairly tricky (though it would not be impossible)
> > to
> > >> have
> > >> >> >>>> made
> > >> >> >>>>>>>> work
> > >> >> >>>>>>>>> nicely. Having the value customizable we thought is a
> > >> reasonable
> > >> >> >>>>>> tradeoff
> > >> >> >>>>>>>>> here of flexibility over contract of interaction between
> > >> >> >>> different
> > >> >> >>>>>>>> parties.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Is there a particular case or benefit of having
> > serialization
> > >> >> >>>>>>>> customizable
> > >> >> >>>>>>>>> that you have in mind?
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Saying this it is obviously something that could be
> > >> implemented,
> > >> >> >>> if
> > >> >> >>>>>> there
> > >> >> >>>>>>>>> is a need. If we did go this avenue I think a defaulted
> > >> >> >>> serializer
> > >> >> >>>>>>>>> implementation should exist so for the 80:20 rule, people
> > can
> > >> >> >>> just
> > >> >> >>>>>> have
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>> broker and clients get default behavior.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Cheers
> > >> >> >>>>>>>>> Mike
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> On 11/6/16, 5:25 PM, "radai" <radai.rosenblatt@gmail.com
> >
> > >> wrote:
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>    making header _key_ serialization configurable
> > potentially
> > >> >> >>>>>> undermines
> > >> >> >>>>>>>>> the
> > >> >> >>>>>>>>>    board usefulness of the feature (any point along the
> > path
> > >> >> >>> must
> > >> >> >>>> be
> > >> >> >>>>>>>> able
> > >> >> >>>>>>>>> to
> > >> >> >>>>>>>>>    read the header keys. the values may be whatever and
> > >> require
> > >> >> >>>> more
> > >> >> >>>>>>>>> intimate
> > >> >> >>>>>>>>>    knowledge of the code that produced specific headers,
> > but
> > >> >> >>> keys
> > >> >> >>>>>> should
> > >> >> >>>>>>>>> be
> > >> >> >>>>>>>>>    universally readable).
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>    it would also make it hard to write really portable
> > >> plugins -
> > >> >> >>>> say
> > >> >> >>>>>> i
> > >> >> >>>>>>>>> wrote a
> > >> >> >>>>>>>>>    large message splitter/combiner - if i rely on key
> > >> >> >>>> "largeMessage"
> > >> >> >>>>>> and
> > >> >> >>>>>>>>>    values of the form "1/20" someone who uses (contrived
> > >> >> >>> example)
> > >> >> >>>>>>>>> Map<Byte[],
> > >> >> >>>>>>>>>    Double> wouldnt be able to re-use my code.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>    not the end of a the world within an organization, but
> > >> >> >>>>>> problematic if
> > >> >> >>>>>>>>> you
> > >> >> >>>>>>>>>    want to enable an ecosystem
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>    On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <
> > >> >> >>>>>> roger.hoover@gmail.com
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> wrote:
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>> As others have laid out, I see strong reasons for a
> common
> > >> >> >>>>>> message
> > >> >> >>>>>>>>>> metadata structure for the Kafka ecosystem.  In
> > particular,
> > >> >> >>>> I've
> > >> >> >>>>>>>>> seen that
> > >> >> >>>>>>>>>> even within a single organization, infrastructure teams
> > >> >> >>> often
> > >> >> >>>>>> own
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>> message metadata while application teams own the
> > >> >> >>>>>> application-level
> > >> >> >>>>>>>>> data
> > >> >> >>>>>>>>>> format.  Allowing metadata and content to have different
> > >> >> >>>>>> structure
> > >> >> >>>>>>>>> and
> > >> >> >>>>>>>>>> evolve separately is very helpful for this.  Also, I
> think
> > >> >> >>>>>> there's
> > >> >> >>>>>>>> a
> > >> >> >>>>>>>>> lot of
> > >> >> >>>>>>>>>> value to having a common metadata structure shared
> across
> > >> >> >>> the
> > >> >> >>>>>> Kafka
> > >> >> >>>>>>>>>> ecosystem so that tools which leverage metadata can more
> > >> >> >>>> easily
> > >> >> >>>>>> be
> > >> >> >>>>>>>>> shared
> > >> >> >>>>>>>>>> across organizations and integrated together.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> The question is, where does the metadata structure
> belong?
> > >> >> >>>>>> Here's
> > >> >> >>>>>>>>> my take:
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> We change the Kafka wire and on-disk format to from a
> > (key,
> > >> >> >>>>>> value)
> > >> >> >>>>>>>>> model to
> > >> >> >>>>>>>>>> a (key, metadata, value) model where all three are byte
> > >> >> >>>> arrays
> > >> >> >>>>>> from
> > >> >> >>>>>>>>> the
> > >> >> >>>>>>>>>> brokers point of view.  The primary reason for this is
> > that
> > >> >> >>>> it
> > >> >> >>>>>>>>> provides a
> > >> >> >>>>>>>>>> backward compatible migration path forward.  Producers
> can
> > >> >> >>>> start
> > >> >> >>>>>>>>> populating
> > >> >> >>>>>>>>>> metadata fields before all consumers understand the
> > >> >> >>> metadata
> > >> >> >>>>>>>>> structure.
> > >> >> >>>>>>>>>> For people who already have custom envelope structures,
> > >> >> >>> they
> > >> >> >>>> can
> > >> >> >>>>>>>>> populate
> > >> >> >>>>>>>>>> their existing structure and the new structure for a
> while
> > >> >> >>> as
> > >> >> >>>>>> they
> > >> >> >>>>>>>>> make the
> > >> >> >>>>>>>>>> transition.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> We could stop there and let the clients plug in a
> > >> >> >>>> KeySerializer,
> > >> >> >>>>>>>>>> MetadataSerializer, and ValueSerializer but I think it
> is
> > >> >> >>>> also
> > >> >> >>>>>> be
> > >> >> >>>>>>>>> useful to
> > >> >> >>>>>>>>>> have a default MetadataSerializer that implements a
> > >> >> >>> key-value
> > >> >> >>>>>> model
> > >> >> >>>>>>>>> similar
> > >> >> >>>>>>>>>> to AMQP or HTTP headers.  Or we could go even further
> and
> > >> >> >>>>>>>> prescribe a
> > >> >> >>>>>>>>>> Map<String, byte[]> or Map<String, String> data model
> for
> > >> >> >>>>>> headers
> > >> >> >>>>>>>> in
> > >> >> >>>>>>>>> the
> > >> >> >>>>>>>>>> clients (while still allowing custom serialization of
> the
> > >> >> >>>> header
> > >> >> >>>>>>>> data
> > >> >> >>>>>>>>>> model).
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> I think this would address Radai's concerns:
> > >> >> >>>>>>>>>> 1. All client code would not need to be updated to know
> > >> >> >>> about
> > >> >> >>>>>> the
> > >> >> >>>>>>>>>> container.
> > >> >> >>>>>>>>>> 2. Middleware friendly clients would have a standard
> > header
> > >> >> >>>> data
> > >> >> >>>>>>>>> model to
> > >> >> >>>>>>>>>> work with.
> > >> >> >>>>>>>>>> 3. KIP is required both b/c of broker changes and
> because
> > >> >> >>> of
> > >> >> >>>>>> client
> > >> >> >>>>>>>>> API
> > >> >> >>>>>>>>>> changes.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Cheers,
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Roger
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> On Wed, Nov 2, 2016 at 4:38 PM, radai <
> > >> >> >>>>>> radai.rosenblatt@gmail.com>
> > >> >> >>>>>>>>> wrote:
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>>> my biggest issues with a "standard" wrapper format:
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> 1. _ALL_ client _CODE_ (as opposed to kafka lib
> version)
> > >> >> >>>> must
> > >> >> >>>>>> be
> > >> >> >>>>>>>>> updated
> > >> >> >>>>>>>>>> to
> > >> >> >>>>>>>>>>> know about the container, because any old naive code
> > >> >> >>>> trying to
> > >> >> >>>>>>>>> directly
> > >> >> >>>>>>>>>>> deserialize its own payload would keel over and die (it
> > >> >> >>>> needs
> > >> >> >>>>>> to
> > >> >> >>>>>>>>> know to
> > >> >> >>>>>>>>>>> deserialize a container, and then dig in there for its
> > >> >> >>>>>> payload).
> > >> >> >>>>>>>>>>> 2. in order to write middleware-friendly clients that
> > >> >> >>>> utilize
> > >> >> >>>>>>>> such
> > >> >> >>>>>>>>> a
> > >> >> >>>>>>>>>>> container one would basically have to write their own
> > >> >> >>>>>>>>> producer/consumer
> > >> >> >>>>>>>>>> API
> > >> >> >>>>>>>>>>> on top of the open source kafka one.
> > >> >> >>>>>>>>>>> 3. if you were going to go with a wrapper format you
> > >> >> >>> really
> > >> >> >>>>>> dont
> > >> >> >>>>>>>>> need to
> > >> >> >>>>>>>>>>> bother with a kip (just open source your own client
> stack
> > >> >> >>>>>> from #2
> > >> >> >>>>>>>>> above
> > >> >> >>>>>>>>>> so
> > >> >> >>>>>>>>>>> others could stop re-inventing it)
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
> > >> >> >>>>>>>> wushujames@gmail.com>
> > >> >> >>>>>>>>>> wrote:
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>>> How exactly would this work? Or maybe that's out of
> > >> >> >>> scope
> > >> >> >>>>>> for
> > >> >> >>>>>>>>> this
> > >> >> >>>>>>>>>> email.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> The information contained in this email is strictly
> > >> confidential
> > >> >> >>>> and
> > >> >> >>>>>> for
> > >> >> >>>>>>>>> the use of the addressee only, unless otherwise
> indicated.
> > >> If you
> > >> >> >>>> are
> > >> >> >>>>>> not
> > >> >> >>>>>>>>> the intended recipient, please do not read, copy, use or
> > >> disclose
> > >> >> >>>> to
> > >> >> >>>>>>>> others
> > >> >> >>>>>>>>> this message or any attachment. Please also notify the
> > >> sender by
> > >> >> >>>>>> replying
> > >> >> >>>>>>>>> to this email or by telephone (+44(020 7896 0011) and
> then
> > >> delete
> > >> >> >>>> the
> > >> >> >>>>>>>> email
> > >> >> >>>>>>>>> and any copies of it. Opinions, conclusion (etc) that do
> > not
> > >> >> >>>> relate to
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>> official business of this company shall be understood as
> > >> neither
> > >> >> >>>> given
> > >> >> >>>>>>>> nor
> > >> >> >>>>>>>>> endorsed by it. IG is a trading name of IG Markets
> Limited
> > (a
> > >> >> >>>> company
> > >> >> >>>>>>>>> registered in England and Wales, company number 04008957)
> > >> and IG
> > >> >> >>>> Index
> > >> >> >>>>>>>>> Limited (a company registered in England and Wales,
> company
> > >> >> >>> number
> > >> >> >>>>>>>>> 01190902). Registered address at Cannon Bridge House, 25
> > >> Dowgate
> > >> >> >>>> Hill,
> > >> >> >>>>>>>>> London EC4R 2YA. Both IG Markets Limited (register number
> > >> 195355)
> > >> >> >>>> and
> > >> >> >>>>>> IG
> > >> >> >>>>>>>>> Index Limited (register number 114059) are authorised and
> > >> >> >>>> regulated by
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>> Financial Conduct Authority.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>> The information contained in this email is strictly
> > >> confidential
> > >> >> >>> and
> > >> >> >>>> for
> > >> >> >>>>>>>> the use of the addressee only, unless otherwise indicated.
> > If
> > >> you
> > >> >> >>> are
> > >> >> >>>>>> not
> > >> >> >>>>>>>> the intended recipient, please do not read, copy, use or
> > >> disclose
> > >> >> >>> to
> > >> >> >>>>>> others
> > >> >> >>>>>>>> this message or any attachment. Please also notify the
> > sender
> > >> by
> > >> >> >>>>>> replying
> > >> >> >>>>>>>> to this email or by telephone (+44(020 7896 0011) and then
> > >> delete
> > >> >> >>> the
> > >> >> >>>>>> email
> > >> >> >>>>>>>> and any copies of it. Opinions, conclusion (etc) that do
> not
> > >> >> relate
> > >> >> >>>> to
> > >> >> >>>>>> the
> > >> >> >>>>>>>> official business of this company shall be understood as
> > >> neither
> > >> >> >>>> given
> > >> >> >>>>>> nor
> > >> >> >>>>>>>> endorsed by it. IG is a trading name of IG Markets Limited
> > (a
> > >> >> >>> company
> > >> >> >>>>>>>> registered in England and Wales, company number 04008957)
> > and
> > >> IG
> > >> >> >>>> Index
> > >> >> >>>>>>>> Limited (a company registered in England and Wales,
> company
> > >> number
> > >> >> >>>>>>>> 01190902). Registered address at Cannon Bridge House, 25
> > >> Dowgate
> > >> >> >>>> Hill,
> > >> >> >>>>>>>> London EC4R 2YA. Both IG Markets Limited (register number
> > >> 195355)
> > >> >> >>>> and IG
> > >> >> >>>>>>>> Index Limited (register number 114059) are authorised and
> > >> >> regulated
> > >> >> >>>> by
> > >> >> >>>>>> the
> > >> >> >>>>>>>> Financial Conduct Authority.
> > >> >> >>>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> --
> > >> >> >>>>>> Gwen Shapira
> > >> >> >>>>>> Product Manager | Confluent
> > >> >> >>>>>> 650.450.2760 | @gwenshap
> > >> >> >>>>>> Follow us: Twitter | blog
> > >> >> >>>>>>
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>> --
> > >> >> >>>> Gwen Shapira
> > >> >> >>>> Product Manager | Confluent
> > >> >> >>>> 650.450.2760 | @gwenshap
> > >> >> >>>> Follow us: Twitter | blog
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> --
> > >> >> >>> Nacho (Ignacio) Solis
> > >> >> >>> Kafka
> > >> >> >>> nsolis@linkedin.com
> > >> >> >>>
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > --
> > >> >> > Gwen Shapira
> > >> >> > Product Manager | Confluent
> > >> >> > 650.450.2760 | @gwenshap
> > >> >> > Follow us: Twitter | blog
> > >> >>
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Gwen Shapira
> > >> Product Manager | Confluent
> > >> 650.450.2760 | @gwenshap
> > >> Follow us: Twitter | blog
> > >>
> > >
> > >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> > and any copies of it. Opinions, conclusion (etc) that do not relate to
> the
> > official business of this company shall be understood as neither given
> nor
> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> > registered in England and Wales, company number 04008957) and IG Index
> > Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message