kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Schierbeck <da...@zendesk.com.INVALID>
Subject Re: [DISCUSS] KIP-82 - Add Record Headers
Date Fri, 02 Dec 2016 08:30:54 GMT
I don't have a lot of feedback on this, but at Zendesk we could definitely
use a standardized header system. Using ints as keys sounds tedious, but if
that's a necessary tradeoff I'd be okay with it.

On Fri, Dec 2, 2016 at 5:44 AM Todd Palino <tpalino@gmail.com> wrote:

> Come on, I’ve done at least 2 talks on this one :)
>
> Producing counts to a topic is part of it, but that’s only part. So you
> count you have 100 messages in topic A. When you mirror topic A to another
> cluster, you have 99 messages. Where was your problem? Or worse, you have
> 100 messages, but one producer duplicated messages and another one lost
> messages. You need details about where the message came from in order to
> pinpoint problems when they happen. Source producer info, where it was
> produced into your infrastructure, and when it was produced. This requires
> you to add the information to the message.
>
> And yes, you still need to maintain your clients. So maybe my original
> example was not the best. My thoughts on not wanting to be responsible for
> message formats stands, because that’s very much separate from the client.
> As you know, we have our own internal client library that can insert the
> right headers, and right now inserts the right audit information into the
> message fields. If they exist, and assuming the message is Avro encoded.
> What if someone wants to use JSON instead for a good reason? What if user X
> wants to encrypt messages, but user Y does not? Maintaining the client
> library is still much easier than maintaining the message formats.
>
>
> -Todd
>
>
>
> On Thu, Dec 1, 2016 at 6:21 PM, Gwen Shapira <gwen@confluent.io> wrote:
>
> > Based on your last sentence, consider me convinced :)
> >
> > I get why headers are critical for Mirroring (you need tags to prevent
> > loops and sometimes to route messages to the correct destination).
> > But why do you need headers to audit? We are auditing by producing
> > counts to a side topic (and I was under the impression you do the
> > same), so we never need to modify the message.
> >
> > Another thing - after we added headers, wouldn't you be in the
> > business of making sure everyone uses them properly? Making sure
> > everyone includes the right headers you need, not using the header
> > names you intend to use, etc. I don't think the "policing" business
> > will ever go away.
> >
> > On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino <tpalino@gmail.com> wrote:
> > > Got it. As an ops guy, I'm not very happy with the workaround. Avro
> means
> > > that I have to be concerned with the format of the messages in order to
> > run
> > > the infrastructure (audit, mirroring, etc.). That means that I have to
> > > handle the schemas, and I have to enforce rules about good formats.
> This
> > is
> > > not something I want to be in the business of, because I should be able
> > to
> > > run a service infrastructure without needing to be in the weeds of
> > dealing
> > > with customer data formats.
> > >
> > > Trust me, a sizable portion of my support time is spent dealing with
> > schema
> > > issues. I really would like to get away from that. Maybe I'd have more
> > time
> > > for other hobbies. Like writing. ;)
> > >
> > > -Todd
> > >
> > > On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira <gwen@confluent.io> wrote:
> > >
> > >> I'm pretty satisfied with the current workarounds (Avro container
> > >> format), so I'm not too excited about the extra work required to do
> > >> headers in Kafka. I absolutely don't mind it if you do it...
> > >> I think the Apache convention for "good idea, but not willing to put
> > >> any work toward it" is +0.5? anyway, that's what I was trying to
> > >> convey :)
> > >>
> > >> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino <tpalino@gmail.com>
> wrote:
> > >> > Well I guess my question for you, then, is what is holding you back
> > from
> > >> > full support for headers? What’s the bit that you’re missing that
> has
> > you
> > >> > under a full +1?
> > >> >
> > >> > -Todd
> > >> >
> > >> >
> > >> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira <gwen@confluent.io>
> > wrote:
> > >> >
> > >> >> I know why people who support headers support them, and I've seen
> > what
> > >> >> the discussion is like.
> > >> >>
> > >> >> This is why I'm asking people who are against headers (especially
> > >> >> committers) what will make them change their mind - so we can get
> > this
> > >> >> part over one way or another.
> > >> >>
> > >> >> If I sound frustrated it is not at Radai, Jun or you (Todd)... I am
> > >> >> just looking for something concrete we can do to move the
> discussion
> > >> >> along to the yummy design details (which is the argument I really
> am
> > >> >> looking forward to).
> > >> >>
> > >> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino <tpalino@gmail.com>
> > wrote:
> > >> >> > So, Gwen, to your question (even though I’m not a committer)...
> > >> >> >
> > >> >> > I have always been a strong supporter of introducing the concept
> > of an
> > >> >> > envelope to messages, which headers accomplishes. The message key
> > is
> > >> >> > already an example of a piece of envelope information. By
> > providing a
> > >> >> means
> > >> >> > to do this within Kafka itself, and not relying on use-case
> > specific
> > >> >> > implementations, you make it much easier for components to
> > >> interoperate.
> > >> >> It
> > >> >> > simplifies development of all these things (message routing,
> > auditing,
> > >> >> > encryption, etc.) because each one does not have to reinvent the
> > >> wheel.
> > >> >> >
> > >> >> > It also makes it much easier from a client point of view if the
> > >> headers
> > >> >> are
> > >> >> > defined as part of the protocol and/or message format in general
> > >> because
> > >> >> > you can easily produce and consume messages without having to
> take
> > >> into
> > >> >> > account specific cases. For example, I want to route messages,
> but
> > >> >> client A
> > >> >> > doesn’t support the way audit implemented headers, and client B
> > >> doesn’t
> > >> >> > support the way encryption or routing implemented headers, so now
> > my
> > >> >> > application has to create some really fragile (my autocorrect
> just
> > >> tried
> > >> >> to
> > >> >> > make that “tragic”, which is probably appropriate too) code to
> > strip
> > >> >> > everything off, rather than just consuming the messages, picking
> > out
> > >> the
> > >> >> 1
> > >> >> > or 2 headers it’s interested in, and performing its function.
> > >> >> >
> > >> >> > Honestly, this discussion has been going on for a long time, and
> > it’s
> > >> >> > always “Oh, you came up with 2 use cases, and yeah, those use
> cases
> > >> are
> > >> >> > real things that someone would want to do. Here’s an alternate
> way
> > to
> > >> >> > implement them so let’s not do headers.” If we have a few use
> cases
> > >> that
> > >> >> we
> > >> >> > actually came up with, you can be sure that over the next year
> > >> there’s a
> > >> >> > dozen others that we didn’t think of that someone would like to
> > do. I
> > >> >> > really think it’s time to stop rehashing this discussion and
> > instead
> > >> >> focus
> > >> >> > on a workable standard that we can adopt.
> > >> >> >
> > >> >> > -Todd
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino <tpalino@gmail.com>
> > >> wrote:
> > >> >> >
> > >> >> >> C. per message encryption
> > >> >> >>> One drawback of this approach is that this significantly reduce
> > the
> > >> >> >>> effectiveness of compression, which happens on a set of
> > serialized
> > >> >> >>> messages. An alternative is to enable SSL for wire encryption
> and
> > >> rely
> > >> >> on
> > >> >> >>> the storage system (e.g. LUKS) for at rest encryption.
> > >> >> >>
> > >> >> >>
> > >> >> >> Jun, this is not sufficient. While this does cover the case of
> > >> removing
> > >> >> a
> > >> >> >> drive from the system, it will not satisfy most compliance
> > >> requirements
> > >> >> for
> > >> >> >> encryption of data as whoever has access to the broker itself
> > still
> > >> has
> > >> >> >> access to the unencrypted data. For end-to-end encryption you
> > need to
> > >> >> >> encrypt at the producer, before it enters the system, and
> decrypt
> > at
> > >> the
> > >> >> >> consumer, after it exits the system.
> > >> >> >>
> > >> >> >> -Todd
> > >> >> >>
> > >> >> >>
> > >> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai <
> radai.rosenblatt@gmail.com
> > >
> > >> >> wrote:
> > >> >> >>
> > >> >> >>> another big plus of headers in the protocol is that it would
> > enable
> > >> >> rapid
> > >> >> >>> iteration on ideas outside of core kafka and would reduce the
> > >> number of
> > >> >> >>> future wire format changes required.
> > >> >> >>>
> > >> >> >>> a lot of what is currently a KIP represents use cases that are
> > not
> > >> 100%
> > >> >> >>> relevant to all users, and some of them require rather invasive
> > wire
> > >> >> >>> protocol changes. a thing a good recent example of this is
> > kip-98.
> > >> >> >>> tx-utilizing traffic is expected to be a very small fraction of
> > >> total
> > >> >> >>> traffic and yet the changes are invasive.
> > >> >> >>>
> > >> >> >>> every such wire format change translates into painful and slow
> > >> >> adoption of
> > >> >> >>> new versions.
> > >> >> >>>
> > >> >> >>> i think a lot of functionality currently in KIPs could be "spun
> > out"
> > >> >> and
> > >> >> >>> implemented as opt-in plugins transmitting data over headers.
> > this
> > >> >> would
> > >> >> >>> keep the core wire format stable(r), core codebase smaller, and
> > >> avoid
> > >> >> the
> > >> >> >>> "burden of proof" thats sometimes required to prove a certain
> > >> feature
> > >> >> is
> > >> >> >>> useful enough for a wide-enough audience to warrant a wire
> format
> > >> >> change
> > >> >> >>> and code complexity additions.
> > >> >> >>>
> > >> >> >>> (to be clear - kip-98 goes beyond "mere" wire format changes
> and
> > im
> > >> not
> > >> >> >>> saying it could have been completely done with headers, but
> > >> >> exactly-once
> > >> >> >>> delivery certainly could)
> > >> >> >>>
> > >> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen Shapira <
> gwen@confluent.io
> > >
> > >> >> wrote:
> > >> >> >>>
> > >> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai <
> > >> radai.rosenblatt@gmail.com>
> > >> >> >>> wrote:
> > >> >> >>> > > "For use cases within an organization, one could always use
> > >> other
> > >> >> >>> > > approaches such as company-wise containers"
> > >> >> >>> > > this is what linkedin has traditionally done but there are
> > now
> > >> >> cases
> > >> >> >>> > (read
> > >> >> >>> > > - topics) where this is not acceptable. this makes headers
> > >> useful
> > >> >> even
> > >> >> >>> > > within single orgs for cases where one-container-fits-all
> > cannot
> > >> >> >>> apply.
> > >> >> >>> > >
> > >> >> >>> > > as for the particular use cases listed, i dont want this to
> > >> devolve
> > >> >> >>> to a
> > >> >> >>> > > discussion of particular use cases - i think its enough
> that
> > >> some
> > >> >> of
> > >> >> >>> them
> > >> >> >>> >
> > >> >> >>> > I think a main point of contention is that: We identified few
> > >> >> >>> > use-cases where headers are useful, do we want Kafka to be a
> > >> system
> > >> >> >>> > that supports those use-cases?
> > >> >> >>> >
> > >> >> >>> > For example, Jun said:
> > >> >> >>> > "Not sure how widely useful record-level lineage is though
> > since
> > >> the
> > >> >> >>> > overhead could
> > >> >> >>> > be significant."
> > >> >> >>> >
> > >> >> >>> > We know NiFi supports record level lineage. I don't think it
> > was
> > >> >> >>> > developed for lols, I think it is safe to assume that the NSA
> > >> needed
> > >> >> >>> > that functionality. We also know that certain financial
> > institutes
> > >> >> >>> > need to track tampering with records at a record level and
> > there
> > >> are
> > >> >> >>> > federal regulations that absolutely require this. They also
> > need
> > >> to
> > >> >> >>> > prove that routing apps that "touches" the messages and
> either
> > >> reads
> > >> >> >>> > or updates headers couldn't have possibly modified the
> payload
> > >> >> itself.
> > >> >> >>> > They use record level encryption to do that - apps can read
> and
> > >> >> >>> > (sometimes) modify headers but can't touch the payload.
> > >> >> >>> >
> > >> >> >>> > We can totally say "those are corner cases and not worth
> adding
> > >> >> >>> > headers to Kafka for", they should use a different pubsub
> > message
> > >> for
> > >> >> >>> > that (Nifi or one of the other 1000 that cater specifically
> to
> > the
> > >> >> >>> > financial industry).
> > >> >> >>> >
> > >> >> >>> > But this gets us into a catch 22:
> > >> >> >>> > If we discuss a specific use-case, someone can always say it
> > isn't
> > >> >> >>> > interesting enough for Kafka. If we discuss more general
> > trends,
> > >> >> >>> > others can say "well, we are not sure any of them really
> needs
> > >> >> headers
> > >> >> >>> > specifically. This is just hand waving and not interesting.".
> > >> >> >>> >
> > >> >> >>> > I think discussing use-cases in specifics is super important
> to
> > >> >> decide
> > >> >> >>> > implementation details for headers (my use-cases lean toward
> > >> >> numerical
> > >> >> >>> > keys with namespaces and object values, others differ), but I
> > >> think
> > >> >> we
> > >> >> >>> > need to answer the general "Are we going to have headers"
> > question
> > >> >> >>> > first.
> > >> >> >>> >
> > >> >> >>> > I'd love to hear from the other committers in the discussion:
> > >> >> >>> > What would it take to convince you that headers in Kafka are
> a
> > >> good
> > >> >> >>> > idea in general, so we can move ahead and try to agree on the
> > >> >> details?
> > >> >> >>> >
> > >> >> >>> > I feel like we keep moving the goal posts and this is truly
> > >> >> exhausting.
> > >> >> >>> >
> > >> >> >>> > For the record, I mildly support adding headers to Kafka
> > (+0.5?).
> > >> >> >>> > The community can continue to find workarounds to the issue
> and
> > >> there
> > >> >> >>> > are some benefits to keeping the message format and clients
> > >> simpler.
> > >> >> >>> > But I see the usefulness of headers to many use-cases and if
> we
> > >> can
> > >> >> >>> > find a good and generally useful way to add it to Kafka, it
> > will
> > >> make
> > >> >> >>> > Kafka easier to use for many - worthy goal in my eyes.
> > >> >> >>> >
> > >> >> >>> > > are interesting/feasible, but:
> > >> >> >>> > > A+B. i think there are use cases for polyglot topics.
> > >> especially if
> > >> >> >>> kafka
> > >> >> >>> > > is being used to "trunk" something else.
> > >> >> >>> > > D. multiple topics would make it harder to write portable
> > >> consumer
> > >> >> >>> code.
> > >> >> >>> > > partition remapping would mess with locality of consumption
> > >> >> >>> guarantees.
> > >> >> >>> > > E+F. a use case I see for lineage/metadata is
> > >> billing/chargeback.
> > >> >> for
> > >> >> >>> > that
> > >> >> >>> > > use case it is not enough to simply record the point of
> > origin,
> > >> but
> > >> >> >>> every
> > >> >> >>> > > replication stop (think mirror maker) must also add a
> record
> > to
> > >> >> form a
> > >> >> >>> > > "transit log".
> > >> >> >>> > >
> > >> >> >>> > > as for stream processing on top of kafka - i know samza
> has a
> > >> >> metadata
> > >> >> >>> > map
> > >> >> >>> > > which they carry around in addition to user values. headers
> > are
> > >> the
> > >> >> >>> > perfect
> > >> >> >>> > > fit for these things.
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun Rao <jun@confluent.io
> >
> > >> wrote:
> > >> >> >>> > >
> > >> >> >>> > >> Hi, Michael,
> > >> >> >>> > >>
> > >> >> >>> > >> In order to answer the first two questions, it would be
> > helpful
> > >> >> if we
> > >> >> >>> > could
> > >> >> >>> > >> identify 1 or 2 strong use cases for headers in the space
> > for
> > >> >> >>> > third-party
> > >> >> >>> > >> vendors. For use cases within an organization, one could
> > always
> > >> >> use
> > >> >> >>> > other
> > >> >> >>> > >> approaches such as company-wise containers to get around
> w/o
> > >> >> >>> headers. I
> > >> >> >>> > >> went through the use cases in the KIP and in Radai's wiki
> (
> > >> >> >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+
> <https://cwiki.apache.org/confluence/display/KAFKA/A+>
> > >> >> >>> > Case+for+Kafka+Headers
> > >> >> >>> > >> ).
> > >> >> >>> > >> The following are the ones that that I understand and
> could
> > be
> > >> in
> > >> >> the
> > >> >> >>> > >> third-party use case category.
> > >> >> >>> > >>
> > >> >> >>> > >> A. content-type
> > >> >> >>> > >> It seems that in general, content-type should be set at
> the
> > >> topic
> > >> >> >>> level.
> > >> >> >>> > >> Not sure if mixing messages with different content types
> > >> should be
> > >> >> >>> > >> encouraged.
> > >> >> >>> > >>
> > >> >> >>> > >> B. schema id
> > >> >> >>> > >> Since the value is mostly useless without schema id, it
> > seems
> > >> that
> > >> >> >>> > storing
> > >> >> >>> > >> the schema id together with serialized bytes in the value
> is
> > >> >> better?
> > >> >> >>> > >>
> > >> >> >>> > >> C. per message encryption
> > >> >> >>> > >> One drawback of this approach is that this significantly
> > reduce
> > >> >> the
> > >> >> >>> > >> effectiveness of compression, which happens on a set of
> > >> serialized
> > >> >> >>> > >> messages. An alternative is to enable SSL for wire
> > encryption
> > >> and
> > >> >> >>> rely
> > >> >> >>> > on
> > >> >> >>> > >> the storage system (e.g. LUKS) for at rest encryption.
> > >> >> >>> > >>
> > >> >> >>> > >> D. cluster ID for mirroring across Kafka clusters
> > >> >> >>> > >> This is actually interesting. Today, to avoid introducing
> > >> cycles
> > >> >> when
> > >> >> >>> > doing
> > >> >> >>> > >> mirroring across data centers, one would either have to
> set
> > up
> > >> two
> > >> >> >>> Kafka
> > >> >> >>> > >> clusters (a local and an aggregate) per data center or
> > rename
> > >> >> topics.
> > >> >> >>> > >> Neither is ideal. With headers, the producer could tag
> each
> > >> >> message
> > >> >> >>> with
> > >> >> >>> > >> the producing cluster ID in the header. MirrorMaker could
> > then
> > >> >> avoid
> > >> >> >>> > >> mirroring messages to a cluster if they are tagged with
> the
> > >> same
> > >> >> >>> cluster
> > >> >> >>> > >> id.
> > >> >> >>> > >>
> > >> >> >>> > >> However, an alternative approach is to introduce sth like
> > >> >> >>> hierarchical
> > >> >> >>> > >> topic and store messages from different clusters in
> > different
> > >> >> >>> partitions
> > >> >> >>> > >> under the same topic. This approach avoids filtering out
> > >> unneeded
> > >> >> >>> data
> > >> >> >>> > and
> > >> >> >>> > >> makes offset preserving easier to support. It may make
> > >> compaction
> > >> >> >>> > trickier
> > >> >> >>> > >> though since the same key may show up in different
> > partitions.
> > >> >> >>> > >>
> > >> >> >>> > >> E. record-level lineage
> > >> >> >>> > >> For example, a source connector could store in the message
> > the
> > >> >> >>> metadata
> > >> >> >>> > >> (e.g. UUID) of the source record. Similarly, if a stream
> job
> > >> >> >>> transforms
> > >> >> >>> > >> messages from topic A to topic B, the library could
> include
> > the
> > >> >> >>> source
> > >> >> >>> > >> message offset in each of the transformed message in the
> > >> header.
> > >> >> Not
> > >> >> >>> > sure
> > >> >> >>> > >> how widely useful record-level lineage is though since the
> > >> >> overhead
> > >> >> >>> > could
> > >> >> >>> > >> be significant.
> > >> >> >>> > >>
> > >> >> >>> > >> F. auditing metadata
> > >> >> >>> > >> We could put things like clientId/host/user in the header
> in
> > >> each
> > >> >> >>> > message
> > >> >> >>> > >> for auditing. These metadata are really at the producer
> > level
> > >> >> though.
> > >> >> >>> > So, a
> > >> >> >>> > >> more efficient way is to only include a "producerId" per
> > >> message
> > >> >> and
> > >> >> >>> > send
> > >> >> >>> > >> the producerId -> metadata mapping independently. KIP-98
> is
> > >> >> actually
> > >> >> >>> > >> proposing including such a producerId natively in the
> > message.
> > >> >> >>> > >>
> > >> >> >>> > >> So, overall, I not sure that I am fully convinced of the
> > strong
> > >> >> >>> > third-party
> > >> >> >>> > >> use cases of headers yet. Perhaps we could discuss a bit
> > more
> > >> to
> > >> >> make
> > >> >> >>> > one
> > >> >> >>> > >> or two really convincing use cases.
> > >> >> >>> > >>
> > >> >> >>> > >> Another orthogonal question is whether header should be
> > >> exposed
> > >> >> in
> > >> >> >>> > stream
> > >> >> >>> > >> processing systems such Kafka stream, Samza, and Spark
> > >> streaming.
> > >> >> >>> > >> Currently, those systems just deal with key/value pairs.
> > >> Should we
> > >> >> >>> > expose a
> > >> >> >>> > >> third thing header there too or somehow map header to key
> or
> > >> >> value?
> > >> >> >>> > >>
> > >> >> >>> > >> Thanks,
> > >> >> >>> > >>
> > >> >> >>> > >> Jun
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce <
> > >> >> >>> Michael.Pearce@ig.com>
> > >> >> >>> > >> wrote:
> > >> >> >>> > >>
> > >> >> >>> > >> > I assume, that after a period of a week, that there is
> no
> > >> >> concerns
> > >> >> >>> now
> > >> >> >>> > >> > with points 1, and 2 and now we have agreement that
> > headers
> > >> are
> > >> >> >>> useful
> > >> >> >>> > >> and
> > >> >> >>> > >> > needed in Kafka. As such if put to a KIP vote, this
> > wouldn’t
> > >> be
> > >> >> a
> > >> >> >>> > reason
> > >> >> >>> > >> to
> > >> >> >>> > >> > reject.
> > >> >> >>> > >> >
> > >> >> >>> > >> > @
> > >> >> >>> > >> > Ignacio on point 4).
> > >> >> >>> > >> > I think for purpose of getting this KIP moving past
> this,
> > we
> > >> can
> > >> >> >>> state
> > >> >> >>> > >> the
> > >> >> >>> > >> > key will be a 4 bytes space that can will be naturally
> > >> >> interpreted
> > >> >> >>> as
> > >> >> >>> > an
> > >> >> >>> > >> > Int32 (if namespacing is later wanted you can easily
> split
> > >> this
> > >> >> >>> into
> > >> >> >>> > two
> > >> >> >>> > >> > int16 spaces), from the wire protocol implementation
> this
> > >> makes
> > >> >> no
> > >> >> >>> > >> > difference I don’t believe. Is this reasonable to all?
> > >> >> >>> > >> >
> > >> >> >>> > >> > On 5) as per point 4 therefor happy we keep with 32
> bits.
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> > On 18/11/2016, 20:34, "ignacio.solis@gmail.com on
> behalf
> > of
> > >> >> >>> Ignacio
> > >> >> >>> > >> > Solis" <ignacio.solis@gmail.com on behalf of
> > isolis@igso.net
> > >> >
> > >> >> >>> wrote:
> > >> >> >>> > >> >
> > >> >> >>> > >> > Summary:
> > >> >> >>> > >> >
> > >> >> >>> > >> > 3) Yes - Header value as byte[]
> > >> >> >>> > >> >
> > >> >> >>> > >> > 4a) Int,Int - No
> > >> >> >>> > >> > 4b) Int - Yes
> > >> >> >>> > >> > 4c) String - Reluctant maybe
> > >> >> >>> > >> >
> > >> >> >>> > >> > 5) I believe the header system should take a single
> > >> int. I
> > >> >> >>> think
> > >> >> >>> > >> > 32bits is
> > >> >> >>> > >> > a good size, if you want to interpret this as to 16bit
> > >> >> numbers
> > >> >> >>> in
> > >> >> >>> > the
> > >> >> >>> > >> > layer
> > >> >> >>> > >> > above go right ahead. If somebody wants to argue for
> > 16
> > >> >> bits
> > >> >> >>> or
> > >> >> >>> > 64
> > >> >> >>> > >> > bits of
> > >> >> >>> > >> > header key space I would listen.
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> > Discussion:
> > >> >> >>> > >> > Dividing the key space into sub_key_1 and sub_key_2
> > >> makes no
> > >> >> >>> > sense to
> > >> >> >>> > >> > me at
> > >> >> >>> > >> > this layer. Are we going to start providing APIs to
> > get
> > >> all
> > >> >> >>> the
> > >> >> >>> > >> > sub_key_1s? or all the sub_key_2s? If there is no
> > >> >> >>> distinguishing
> > >> >> >>> > >> > functions
> > >> >> >>> > >> > that are applied to each one then they should be a
> > single
> > >> >> >>> value.
> > >> >> >>> > At
> > >> >> >>> > >> > this
> > >> >> >>> > >> > layer all we're doing is equality.
> > >> >> >>> > >> > If the above layer wants to interpret this as 2, 3 or
> > >> more
> > >> >> >>> values
> > >> >> >>> > >> > that's a
> > >> >> >>> > >> > different question. I personally think it's all one
> > >> >> keyspace
> > >> >> >>> > that is
> > >> >> >>> > >> > getting assigned using some structure, but if you
> > want to
> > >> >> >>> > sub-assign
> > >> >> >>> > >> > parts
> > >> >> >>> > >> > of it then that's fine.
> > >> >> >>> > >> >
> > >> >> >>> > >> > The same discussion applies to strings. If somebody
> > >> argued
> > >> >> for
> > >> >> >>> > >> > strings,
> > >> >> >>> > >> > would we be arguing to divide the strings with dots
> > ('.')
> > >> >> as a
> > >> >> >>> > >> > requirement?
> > >> >> >>> > >> > Would we want them to give us the different name
> > segments
> > >> >> >>> > separately?
> > >> >> >>> > >> > Would we be performing any actions on this key other
> > than
> > >> >> >>> > matching?
> > >> >> >>> > >> >
> > >> >> >>> > >> > Nacho
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> > On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <
> > >> >> >>> > >> Michael.Pearce@ig.com
> > >> >> >>> > >> > >
> > >> >> >>> > >> > wrote:
> > >> >> >>> > >> >
> > >> >> >>> > >> > > #jay #jun any concerns on 1 and 2 still?
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > @all
> > >> >> >>> > >> > > To get this moving along a bit more I'd also like to
> > >> ask
> > >> >> to
> > >> >> >>> get
> > >> >> >>> > >> > clarity on
> > >> >> >>> > >> > > the below last points:
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > 3) I believe we're all roughly happy with the header
> > >> value
> > >> >> >>> > being a
> > >> >> >>> > >> > byte[]?
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > 4) I believe consensus has been for an namespace
> > based
> > >> int
> > >> >> >>> > approach
> > >> >> >>> > >> > > {int,int} for the key. Any objections if this is
> > what
> > >> we
> > >> >> go
> > >> >> >>> > with?
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > 5) as we have if assumption in (4) is correct,
> > >> {int,int}
> > >> >> >>> keys.
> > >> >> >>> > >> > > Should both int's be int16 or int32?
> > >> >> >>> > >> > > I'm for them being int16(2 bytes) as combined is
> > space
> > >> of
> > >> >> >>> > 4bytes as
> > >> >> >>> > >> > per
> > >> >> >>> > >> > > original and gives plenty of combinations for the
> > >> >> >>> foreseeable,
> > >> >> >>> > and
> > >> >> >>> > >> > keeps
> > >> >> >>> > >> > > the overhead small.
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > Do we see any benefit in another kip call to discuss
> > >> >> these at
> > >> >> >>> > all?
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > Cheers
> > >> >> >>> > >> > > Mike
> > >> >> >>> > >> > > ________________________________________
> > >> >> >>> > >> > > From: K Burstev <k.burstev@yandex.com>
> > >> >> >>> > >> > > Sent: Friday, November 18, 2016 7:07:07 AM
> > >> >> >>> > >> > > To: dev@kafka.apache.org
> > >> >> >>> > >> > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > For what it is worth also i agree. As a user:
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > 1) Yes - Headers are worthwhile
> > >> >> >>> > >> > > 2) Yes - Headers should be a top level option
> > >> >> >>> > >> > >
> > >> >> >>> > >> > > 14.11.2016, 21:15, "Ignacio Solis" <isolis@igso.net
> > >:
> > >> >> >>> > >> > > > 1) Yes - Headers are worthwhile
> > >> >> >>> > >> > > > 2) Yes - Headers should be a top level option
> > >> >> >>> > >> > > >
> > >> >> >>> > >> > > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <
> > >> >> >>> > >> > Michael.Pearce@ig.com>
> > >> >> >>> > >> > > > wrote:
> > >> >> >>> > >> > > >
> > >> >> >>> > >> > > >> Hi Roger,
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> The kip details/examples the original proposal
> > for
> > >> key
> > >> >> >>> > spacing
> > >> >> >>> > >> ,
> > >> >> >>> > >> > not
> > >> >> >>> > >> > > the
> > >> >> >>> > >> > > >> new mentioned as per discussion namespace idea.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> We will need to update the kip, when we get
> > >> agreement
> > >> >> >>> this
> > >> >> >>> > is a
> > >> >> >>> > >> > better
> > >> >> >>> > >> > > >> approach (which seems to be the case if I have
> > >> >> understood
> > >> >> >>> > the
> > >> >> >>> > >> > general
> > >> >> >>> > >> > > >> feeling in the conversation)
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Re the variable ints, at very early stage we did
> > >> think
> > >> >> >>> about
> > >> >> >>> > >> > this. I
> > >> >> >>> > >> > > think
> > >> >> >>> > >> > > >> the added complexity for the saving isn't worth
> > it.
> > >> >> I'd
> > >> >> >>> > rather
> > >> >> >>> > >> go
> > >> >> >>> > >> > > with, if
> > >> >> >>> > >> > > >> we want to reduce overheads and size int16
> > (2bytes)
> > >> >> keys
> > >> >> >>> as
> > >> >> >>> > it
> > >> >> >>> > >> > keeps it
> > >> >> >>> > >> > > >> simple.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> On the note of no headers, there is as per the
> > kip
> > >> as
> > >> >> we
> > >> >> >>> > use an
> > >> >> >>> > >> > > attribute
> > >> >> >>> > >> > > >> bit to denote if headers are present or not as
> > such
> > >> >> >>> > provides a
> > >> >> >>> > >> > zero
> > >> >> >>> > >> > > >> overhead currently if headers are not used.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> I think as radai mentions would be good first
> > if we
> > >> >> can
> > >> >> >>> get
> > >> >> >>> > >> > clarity if
> > >> >> >>> > >> > > do
> > >> >> >>> > >> > > >> we now have general consensus that (1) headers
> > are
> > >> >> >>> > worthwhile
> > >> >> >>> > >> and
> > >> >> >>> > >> > > useful,
> > >> >> >>> > >> > > >> and (2) we want it as a top level entity.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Just to state the obvious i believe (1) headers
> > are
> > >> >> >>> > worthwhile
> > >> >> >>> > >> > and (2)
> > >> >> >>> > >> > > >> agree as a top level entity.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Cheers
> > >> >> >>> > >> > > >> Mike
> > >> >> >>> > >> > > >> ________________________________________
> > >> >> >>> > >> > > >> From: Roger Hoover <roger.hoover@gmail.com>
> > >> >> >>> > >> > > >> Sent: Wednesday, November 9, 2016 9:10:47 PM
> > >> >> >>> > >> > > >> To: dev@kafka.apache.org
> > >> >> >>> > >> > > >> Subject: Re: [DISCUSS] KIP-82 - Add Record
> > Headers
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Sorry for going a little in the weeds but thanks
> > >> for
> > >> >> the
> > >> >> >>> > >> replies
> > >> >> >>> > >> > > regarding
> > >> >> >>> > >> > > >> varint.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Agreed that a prefix and {int, int} can be the
> > >> same.
> > >> >> It
> > >> >> >>> > doesn't
> > >> >> >>> > >> > look
> > >> >> >>> > >> > > like
> > >> >> >>> > >> > > >> that's what the KIP is saying the "Open"
> > section.
> > >> The
> > >> >> >>> > example
> > >> >> >>> > >> > shows
> > >> >> >>> > >> > > >> 2100001
> > >> >> >>> > >> > > >> for New Relic and 210002 for App Dynamics
> > implying
> > >> >> that
> > >> >> >>> the
> > >> >> >>> > New
> > >> >> >>> > >> > Relic
> > >> >> >>> > >> > > >> organization will have only a single header id
> > to
> > >> work
> > >> >> >>> > with. Or
> > >> >> >>> > >> > is
> > >> >> >>> > >> > > 2100001
> > >> >> >>> > >> > > >> a prefix? The main point of a namespace or
> > prefix
> > >> is
> > >> >> to
> > >> >> >>> > reduce
> > >> >> >>> > >> > the
> > >> >> >>> > >> > > >> overhead of config mapping or registration
> > >> depending
> > >> >> on
> > >> >> >>> how
> > >> >> >>> > >> > > >> namespaces/prefixes are managed.
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Would love to hear more feedback on the
> > >> higher-level
> > >> >> >>> > questions
> > >> >> >>> > >> > > though...
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Cheers,
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> Roger
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> On Wed, Nov 9, 2016 at 11:38 AM, radai <
> > >> >> >>> > >> > radai.rosenblatt@gmail.com>
> > >> >> >>> > >> > > wrote:
> > >> >> >>> > >> > > >>
> > >> >> >>> > >> > > >> > I think this discussion is getting a bit into
> > the
> > >> >> >>> weeds on
> > >> >> >>> > >> > technical
> > >> >> >>> > >> > > >> > implementation details.
> > >> >> >>> > >> > > >> > I'd liek to step back a minute and try and
> > >> establish
> > >> >> >>> > where we
> > >> >> >>> > >> > are in
> > >> >> >>> > >> > > the
> > >> >> >>> > >> > > >> > larger picture:
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > (re-wording nacho's last paragraph)
> > >> >> >>> > >> > > >> > 1. are we all in agreement that headers are a
> > >> >> >>> worthwhile
> > >> >> >>> > and
> > >> >> >>> > >> > useful
> > >> >> >>> > >> > > >> > addition to have? this was contested early on
> > >> >> >>> > >> > > >> > 2. are we all in agreement on headers as top
> > >> level
> > >> >> >>> entity
> > >> >> >>> > vs
> > >> >> >>> > >> > headers
> > >> >> >>> > >> > > >> > squirreled-away in V?
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > if there are still concerns around these #2
> > >> points
> > >> >> >>> (#jay?
> > >> >> >>> > >> > #jun?)?
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > (and now back to our normal programming ...)
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > varints are nice. having said that, its adding
> > >> >> >>> complexity
> > >> >> >>> > >> (see
> > >> >> >>> > >> > > >> > https://github.com/addthis/
> <https://github.com/addthis/>
> > >> >> stream-lib/blob/master/src/
> > >> >> >>> > >> > > >> > main/java/com/clearspring/
> > >> >> analytics/util/Varint.java
> > >> >> >>> > >> > > >> > as 1st google result) and would require anyone
> > >> >> writing
> > >> >> >>> > other
> > >> >> >>> > >> > clients
> > >> >> >>> > >> > > (C?
> > >> >> >>> > >> > > >> > Python? Go? Bash? ;-) ) to get/implement the
> > >> same,
> > >> >> and
> > >> >> >>> for
> > >> >> >>> > >> > relatively
> > >> >> >>> > >> > > >> > little gain (int vs string is order of
> > magnitude,
> > >> >> this
> > >> >> >>> > isnt).
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > int namespacing vs {int, int} namespacing are
> > >> >> basically
> > >> >> >>> > the
> > >> >> >>> > >> > same
> > >> >> >>> > >> > > thing -
> > >> >> >>> > >> > > >> > youre just namespacing an int64 and giving
> > people
> > >> >> while
> > >> >> >>> > 2^32
> > >> >> >>> > >> > ranges
> > >> >> >>> > >> > > at a
> > >> >> >>> > >> > > >> > time. the part i like about this is letting
> > >> people
> > >> >> >>> have a
> > >> >> >>> > >> large
> > >> >> >>> > >> > > swath of
> > >> >> >>> > >> > > >> > numbers with one registration so they dont
> > have
> > >> to
> > >> >> come
> > >> >> >>> > back
> > >> >> >>> > >> > for
> > >> >> >>> > >> > > every
> > >> >> >>> > >> > > >> > single plugin/header they want to "reserve".
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover
> > <
> > >> >> >>> > >> > > roger.hoover@gmail.com>
> > >> >> >>> > >> > > >> > wrote:
> > >> >> >>> > >> > > >> >
> > >> >> >>> > >> > > >> > > Since some of the debate has been about
> > >> overhead +
> > >> >> >>> > >> > performance, I'm
> > >> >> >>> > >> > > >> > > wondering if we have considered a varint
> > >> encoding
> > >> >> (
> > >> >> >>> > >> > > >> > > https://developers.google.com/
> <https://developers.google.com/>
> > >> >> protocol-buffers/docs/
> > >> >> >>> > >> > > encoding#varints)
> > >> >> >>> > >> > > >> > for
> > >> >> >>> > >> > > >> > > the header length field (int32 in the
> > proposal)
> > >> >> and
> > >> >> >>> for
> > >> >> >>> > >> > header
> > >> >> >>> > >> > > ids? If
> > >> >> >>> > >> > > >> > you
> > >> >> >>> > >> > > >> > > don't use headers, the overhead would be a
> > >> single
> > >> >> >>> byte
> > >> >> >>> > and
> > >> >> >>> > >> > for each
> > >> >> >>> > >> > > >> > header
> > >> >> >>> > >> > > >> > > id < 128 would also need only a single byte?
> > >> >> >>> > >> > > >> > >
> > >> >> >>> > >> > > >> > >
> > >> >> >>> > >> > > >> > >
> > >> >> >>> > >> > > >> > > On Wed, Nov 9, 2016 at 6:43 AM, radai <
> > >> >> >>> > >> > radai.rosenblatt@gmail.com>
> > >> >> >>> > >> > > >> > wrote:
> > >> >> >>> > >> > > >> > >
> > >> >> >>> > >> > > >> > > > @magnus - and very dangerous (youre
> > >> essentially
> > >> >> >>> > >> > downloading and
> > >> >> >>> > >> > > >> > executing
> > >> >> >>> > >> > > >> > > > arbitrary code off the internet on your
> > >> servers
> > >> >> ...
> > >> >> >>> > bad
> > >> >> >>> > >> > idea
> > >> >> >>> > >> > > without
> > >> >> >>> > >> > > >> a
> > >> >> >>> > >> > > >> > > > sandbox, even with)
> > >> >> >>> > >> > > >> > > >
> > >> >> >>> > >> > > >> > > > as for it being a purely administrative
> > task
> > >> - i
> > >> >> >>> > >> disagree.
> > >> >> >>> > >> > > >> > > >
> > >> >> >>> > >> > > >> > > > i wish it would, really, because then my
> > >> earlier
> > >> >> >>> > point on
> > >> >> >>> > >> > the
> > >> >> >>> > >> > > >> > complexity
> > >> >> >>> > >> > > >> > > of
> > >> >> >>> > >> > > >> > > > the remapping process would be invalid,
> > but
> > >> at
> > >> >> >>> > linkedin,
> > >> >> >>> > >> > for
> > >> >> >>> > >> > > example,
> > >> >> >>> > >> > > >> > we
> > >> >> >>> > >> > > >> > > > (the team im in) run kafka as a service.
> > we
> > >> dont
> > >> >> >>> > really
> > >> >> >>> > >> > know
> > >> >> >>> > >> > > what our
> > >> >> >>> > >> > > >> > > users
> > >> >> >>> > >> > > >> > > > (developing applications that use kafka)
> > are
> > >> up
> > >> >> to
> > >> >> >>> at
> > >> >> >>> > any
> > >> >> >>> > >> > given
> > >> >> >>> > >> > > >> moment.
> > >> >> >>> > >> > > >> > > it
> > >> >> >>> > >> > > >> > > > is very possible (given the existance of
> > >> headers
> > >> >> >>> and a
> > >> >> >>> > >> > > corresponding
> > >> >> >>> > >> > > >> > > plugin
> > >> >> >>> > >> > > >> > > > ecosystem) for some application to "equip"
> > >> their
> > >> >> >>> > >> producers
> > >> >> >>> > >> > and
> > >> >> >>> > >> > > >> > consumers
> > >> >> >>> > >> > > >> > > > with the required plugin without us
> > knowing.
> > >> i
> > >> >> dont
> > >> >> >>> > mean
> > >> >> >>> > >> > to imply
> > >> >> >>> > >> > > >> thats
> > >> >> >>> > >> > > >> > > > bad, i just want to make the point that
> > its
> > >> not
> > >> >> as
> > >> >> >>> > simple
> > >> >> >>> > >> > > keeping it
> > >> >> >>> > >> > > >> in
> > >> >> >>> > >> > > >> > > > sync across a large-enough organization.
> > >> >> >>> > >> > > >> > > >
> > >> >> >>> > >> > > >> > > >
> > >> >> >>> > >> > > >> > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus
> > >> Edenhill
> > >> >> <
> > >> >> >>> > >> > > magnus@edenhill.se>
> > >> >> >>> > >> > > >> > > > wrote:
> > >> >> >>> > >> > > >> > > >
> > >> >> >>> > >> > > >> > > > > I think there is a piece missing in the
> > >> >> Strings
> > >> >> >>> > >> > discussion,
> > >> >> >>> > >> > > where
> > >> >> >>> > >> > > >> > > > > pro-Stringers
> > >> >> >>> > >> > > >> > > > > reason that by providing unique string
> > >> >> >>> identifiers
> > >> >> >>> > for
> > >> >> >>> > >> > each
> > >> >> >>> > >> > > header
> > >> >> >>> > >> > > >> > > > > everything will just
> > >> >> >>> > >> > > >> > > > > magically work for all parts of the
> > stream
> > >> >> >>> pipeline.
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > But the strings dont mean anything by
> > >> >> themselves,
> > >> >> >>> > and
> > >> >> >>> > >> > while we
> > >> >> >>> > >> > > >> could
> > >> >> >>> > >> > > >> > > > > probably envision
> > >> >> >>> > >> > > >> > > > > some auto plugin loader that downloads,
> > >> >> compiles,
> > >> >> >>> > links
> > >> >> >>> > >> > and
> > >> >> >>> > >> > > runs
> > >> >> >>> > >> > > >> > > plugins
> > >> >> >>> > >> > > >> > > > > on-demand
> > >> >> >>> > >> > > >> > > > > as soon as they're seen by a consumer, I
> > >> dont
> > >> >> >>> really
> > >> >> >>> > >> see
> > >> >> >>> > >> > a
> > >> >> >>> > >> > > use-case
> > >> >> >>> > >> > > >> > for
> > >> >> >>> > >> > > >> > > > > something
> > >> >> >>> > >> > > >> > > > > so dynamic (and fragile) in practice.
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > In the real world an application will be
> > >> >> >>> configured
> > >> >> >>> > >> with
> > >> >> >>> > >> > a set
> > >> >> >>> > >> > > of
> > >> >> >>> > >> > > >> > > plugins
> > >> >> >>> > >> > > >> > > > > to either add (producer)
> > >> >> >>> > >> > > >> > > > > or read (consumer) headers.
> > >> >> >>> > >> > > >> > > > > This is an administrative task based on
> > >> what
> > >> >> >>> > features a
> > >> >> >>> > >> > client
> > >> >> >>> > >> > > >> > > > > needs/provides and results in
> > >> >> >>> > >> > > >> > > > > some sort of configuration to enable and
> > >> >> >>> configure
> > >> >> >>> > the
> > >> >> >>> > >> > desired
> > >> >> >>> > >> > > >> > plugins.
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > Since this needs to be kept somewhat in
> > >> sync
> > >> >> >>> across
> > >> >> >>> > an
> > >> >> >>> > >> > > organisation
> > >> >> >>> > >> > > >> > > > (there
> > >> >> >>> > >> > > >> > > > > is no point in having producers
> > >> >> >>> > >> > > >> > > > > add headers no consumers will read, and
> > >> vice
> > >> >> >>> versa),
> > >> >> >>> > >> the
> > >> >> >>> > >> > added
> > >> >> >>> > >> > > >> > > complexity
> > >> >> >>> > >> > > >> > > > > of assigning an id namespace
> > >> >> >>> > >> > > >> > > > > for each plugin as it is being
> > configured
> > >> >> should
> > >> >> >>> be
> > >> >> >>> > >> > tolerable.
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > /Magnus
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > 2016-11-09 13:06 GMT+01:00 Michael
> > Pearce <
> > >> >> >>> > >> > > Michael.Pearce@ig.com>:
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > > Just following/catching up on what
> > seems
> > >> to
> > >> >> be
> > >> >> >>> an
> > >> >> >>> > >> > active
> > >> >> >>> > >> > > night :)
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > @Radai sorry if it may seem obvious
> > but
> > >> what
> > >> >> >>> does
> > >> >> >>> > MD
> > >> >> >>> > >> > stand
> > >> >> >>> > >> > > for?
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > My take on String vs Int:
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > I will state first I am pro Int (16 or
> > >> 32).
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > I do though playing devils advocate
> > see a
> > >> >> big
> > >> >> >>> plus
> > >> >> >>> > >> > with the
> > >> >> >>> > >> > > >> > argument
> > >> >> >>> > >> > > >> > > of
> > >> >> >>> > >> > > >> > > > > > String keys, this is around
> > integrating
> > >> >> into an
> > >> >> >>> > >> > existing
> > >> >> >>> > >> > > >> > eco-system.
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > As many other systems use String based
> > >> >> headers
> > >> >> >>> > >> (Flume,
> > >> >> >>> > >> > JMS)
> > >> >> >>> > >> > > it
> > >> >> >>> > >> > > >> > makes
> > >> >> >>> > >> > > >> > > > it
> > >> >> >>> > >> > > >> > > > > > much easier for these to be
> > >> >> >>> > incorporated/integrated
> > >> >> >>> > >> > into.
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > How with Int based headers could we
> > >> provide
> > >> >> a
> > >> >> >>> > >> > way/guidence to
> > >> >> >>> > >> > > >> make
> > >> >> >>> > >> > > >> > > this
> > >> >> >>> > >> > > >> > > > > > integration simple / easy with
> > transition
> > >> >> flows
> > >> >> >>> > over
> > >> >> >>> > >> to
> > >> >> >>> > >> > > kafka?
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > * tough luck buddy you're on your own
> > >> >> >>> > >> > > >> > > > > > * simply hash the string into int code
> > >> and
> > >> >> hope
> > >> >> >>> > for
> > >> >> >>> > >> no
> > >> >> >>> > >> > > collisions
> > >> >> >>> > >> > > >> > > (how
> > >> >> >>> > >> > > >> > > > to
> > >> >> >>> > >> > > >> > > > > > convert back though?)
> > >> >> >>> > >> > > >> > > > > > * http2 style as mentioned by nacho.
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > cheers,
> > >> >> >>> > >> > > >> > > > > > Mike
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > ______________________________
> > __________
> > >> >> >>> > >> > > >> > > > > > From: radai <
> > radai.rosenblatt@gmail.com>
> > >> >> >>> > >> > > >> > > > > > Sent: Wednesday, November 9, 2016
> > 8:12 AM
> > >> >> >>> > >> > > >> > > > > > To: dev@kafka.apache.org
> > >> >> >>> > >> > > >> > > > > > Subject: Re: [DISCUSS] KIP-82 - Add
> > >> Record
> > >> >> >>> Headers
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > thinking about it some more, the best
> > >> way to
> > >> >> >>> > transmit
> > >> >> >>> > >> > the
> > >> >> >>> > >> > > header
> > >> >> >>> > >> > > >> > > > > remapping
> > >> >> >>> > >> > > >> > > > > > data to consumers would be to put it
> > in
> > >> the
> > >> >> MD
> > >> >> >>> > >> response
> > >> >> >>> > >> > > payload,
> > >> >> >>> > >> > > >> so
> > >> >> >>> > >> > > >> > > > maybe
> > >> >> >>> > >> > > >> > > > > > it should be discussed now.
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > On Wed, Nov 9, 2016 at 12:09 AM,
> > radai <
> > >> >> >>> > >> > > >> radai.rosenblatt@gmail.com
> > >> >> >>> > >> > > >> > >
> > >> >> >>> > >> > > >> > > > > wrote:
> > >> >> >>> > >> > > >> > > > > >
> > >> >> >>> > >> > > >> > > > > > > im not opposed to the idea of
> > namespace
> > >> >> >>> mapping.
> > >> >> >>> > >> all
> > >> >> >>> > >> > im
> > >> >> >>> > >> > > saying
> > >> >> >>> > >> > > >> is
> > >> >> >>> > >> > > >> > > > that
> > >> >> >>> > >> > > >> > > > > > its
> > >> >> >>> > >> > > >> > > > > > > not part of the "mvp" and, since it
> > >> >> requires
> > >> >> >>> no
> > >> >> >>> > >> wire
> > >> >> >>> > >> > format
> > >> >> >>> > >> > > >> > change,
> > >> >> >>> > >> > > >> > > > can
> > >> >> >>> > >> > > >> > > > > > > always be added later.
> > >> >> >>> > >> > > >> > > > > > > also, its not as simple as just
> > >> >> configuring
> > >> >> >>> MM
> > >> >> >>> > to
> > >> >> >>> > >> do
> > >> >> >>> > >> > the
> > >> >> >>> > >> > > >> > transform:
> > >> >> >>> > >> > > >> > > > > lets
> > >> >> >>> > >> > > >> > > > > > > say i've implemented large message
> > >> >> support as
> > >> >> >>> > >> > {666,1} and
> > >> >> >>> > >> > > on
> > >> >> >>> > >> > > >> some
> > >> >> >>> > >> > > >> > > > > mirror
> > >> >> >>> > >> > > >> > > > > > > target cluster its been remapped to
> > >> >> {999,1}.
> > >> >> >>> the
> > >> >> >>> > >> > consumer
> > >> >> >>> > >> > > >> plugin
> > >> >> >>> > >> > > >> > > code
> > >> >> >>> > >> > > >> > > > > > would
> > >> >> >>> > >> > > >> > > > > > > also need to be told to look for the
> > >> large
> > >> >> >>> > message
> > >> >> >>> > >> > "part X
> > >> >> >>> > >> > > of
> > >> >> >>> > >> > > >> Y"
> > >> >> >>> > >> > > >> > > > header
> > >> >> >>> > >> > > >> > > > > > > under {999,1}. doable, but tricky.
> > >> >> >>> > >> > > >> > > > > > >
> > >> >> >>> > >> > > >> > > > > > > On Tue, Nov 8, 2016 at 10:29 PM,
> > Gwen
> > >> >> >>> Shapira <
> > >> >> >>> > >> > > >> gwen@confluent.io
> > >> >> >>> > >> > > >> > >
> > >> >> >>> > >> > > >> > > > > wrote:
> > >> >> >>> > >> > > >> > > > > > >
> > >> >> >>> > >> > > >> > > > > > >> While you can do whatever you want
> > >> with a
> > >> >> >>> > >> namespace
> > >> >> >>> > >> > and
> > >> >> >>> > >> > > your
> > >> >> >>> > >> > > >> > code,
> > >> >> >>> > >> > > >> > > > > > >> what I'd expect is for each app to
> > >> >> >>> namespaces
> > >> >> >>> > >> > > configurable...
> > >> >> >>> > >> > > >> > > > > > >>
> > >> >> >>> > >> > > >> > > > > > >> So if I accidentally used 666 for
> > my
> > >> HR
> > >> >> >>> > >> department,
> > >> >> >>> > >> > and
> > >> >> >>> > >> > > still
> > >> >> >>> > >> > > >> > want
> > >> >> >>> > >> > > >> > > > to
> > >> >> >>> > >> > > >> > > > > > >> run RadaiApp, I can config
> > >> "namespace=42"
> > >> >> >>> for
> > >> >> >>> > >> > RadaiApp and
> > >> >> >>> > >> > > >> > > > everything
> > >> >> >>> > >> > > >> > > > > > >> will look normal.
> > >> >> >>> > >> > > >> > > > > > >>
> > >> >> >>> > >> > > >> > > > > > >> This means you only need to sync
> > usage
> > >> >> >>> inside
> > >> >> >>> > your
> > >> >> >>> > >> > own
> > >> >> >>> > >> > > >> > > organization.
> > >> >> >>> > >> > > >> > > > > > >> Still hard, but somewhat easier
> > than
> > >> >> syncing
> > >> >> >>> > with
> > >> >> >>> > >> > the
> > >> >> >>> > >> > > entire
> > >> >> >>> > >> > > >> > > world.
> > >> >> >>> > >> > > >> > > > > > >>
> > >> >> >>> > >> > > >> > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM,
> > >> radai <
> > >> >> >>> > >> > > >> > > radai.rosenblatt@gmail.com>
> > >> >> >>> > >> > > >> > > > > > >> wrote:
> > >> >> >>> > >> > > >> > > > > > >> > and we can start with {namespace,
> > >> id}
> > >> >> and
> > >> >> >>> no
> > >> >> >>> > >> > re-mapping
> > >> >> >>> > >> > > >> > support
> > >> >> >>> > >> > > >> > > > and
> > >> >> >>> > >> > > >> > > > > > >> always
> > >> >> >>> > >> > > >> > > > > > >> > add it later on if/when
> > collisions
> > >> >> >>> actually
> > >> >> >>> > >> > happen (i
> > >> >> >>> > >> > > dont
> > >> >> >>> > >> > > >> > think
> > >> >> >>> > >> > > >> > > > > > they'd
> > >> >> >>> > >> > > >> > > > > > >> be
> > >> >> >>> > >> > > >> > > > > > >> > a problem).
> > >> >> >>> > >> > > >> > > > > > >> >
> > >> >> >>> > >> > > >> > > > > > >> > every interested party (so orgs
> > or
> > >> >> >>> > individuals)
> > >> >> >>> > >> > could
> > >> >> >>> > >> > > then
> > >> >> >>> > >> > > >> > > > register
> > >> >> >>> > >> > > >> > > > > a
> > >> >> >>> > >> > > >> > > > > > >> > prefix (0 = reserved, 1 =
> > confluent
> > >> ...
> > >> >> >>> 666
> > >> >> >>> > = me
> > >> >> >>> > >> > :-) )
> > >> >> >>> > >> > > and
> > >> >> >>> > >> > > >> do
> > >> >> >>> > >> > > >> > > > > whatever
> > >> >> >>> > >> > > >> > > > > > >> with
> > >> >> >>> > >> > > >> > > > > > >> > the 2nd ID - so once linkedin
> > >> >> registers,
> > >> >> >>> say
> > >> >> >>> > 3,
> > >> >> >>> > >> > then
> > >> >> >>> > >> > > >> linkedin
> > >> >> >>> > >> > > >> > > devs
> > >> >> >>> > >> > > >> > > > > are
> > >> >> >>> > >> > > >> > > > > > >> free
> > >> >> >>> > >> > > >> > > > > > >> > to use {3, *} with a reasonable
> > >> >> >>> expectation
> > >> >> >>> > to
> > >> >> >>> > >> to
> > >> >> >>> > >> > > collide
> > >> >> >>> > >> > > >> with
> > >> >> >>> > >> > > >> > > > > > anything
> > >> >> >>> > >> > > >> > > > > > >> > else. further partitioning of
> > that *
> > >> >> >>> becomes
> > >> >> >>> > >> > linkedin's
> > >> >> >>> > >> > > >> > problem,
> > >> >> >>> > >> > > >> > > > but
> > >> >> >>> > >> > > >> > > > > > the
> > >> >> >>> > >> > > >> > > > > > >> > "upstream registration" of a
> > >> namespace
> > >> >> >>> only
> > >> >> >>> > has
> > >> >> >>> > >> to
> > >> >> >>> > >> > > happen
> > >> >> >>> > >> > > >> > once.
> > >> >> >>> > >> > > >> > > > > > >> >
> > >> >> >>> > >> > > >> > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM,
> > >> James
> > >> >> >>> Cheng <
> > >> >> >>> > >> > > >> > > wushujames@gmail.com
> > >> >> >>> > >> > > >> > > > >
> > >> >> >>> > >> > > >> > > > > > >> wrote:
> > >> >> >>> > >> > > >> > > > > > >> >
> > >> >> >>> > >> > > >> > > > > > >> >>
> > >> >> >>> > >> > > >> > > > > > >> >>
> > >> >> >>> > >> > > >> > > > > > >> >>
> > >> >> >>> > >> > > >> > > > > > >> >> > On Nov 8, 2016, at 5:54 PM,
> > Gwen
> > >> >> >>> Shapira <
> > >> >> >>> > >> > > >> > gwen@confluent.io>
> > >> >> >>> > >> > > >> > > > > > wrote:
> > >> >> >>> > >> > > >> > > > > > >> >> >
> > >> >> >>> > >> > > >> > > > > > >> >> > Thank you so much for this
> > clear
> > >> and
> > >> >> >>> fair
> > >> >> >>> > >> > summary of
> > >> >> >>> > >> > > the
> > >> >> >>> > >> > > >> > > > > arguments.
> > >> >> >>> > >> > > >> > > > > > >> >> >
> > >> >> >>> > >> > > >> > > > > > >> >> > I'm in favor of ints. Not a
> > >> >> >>> deal-breaker,
> > >> >> >>> > but
> > >> >> >>> > >> > in
> > >> >> >>> > >> > > favor.
> > >> >> >>> > >> > > >> > > > > > >> >> >
> > >> >> >>> > >> > > >> > > > > > >> >> > Even more in favor of Magnus's
> > >> >> >>> > decentralized
> > >> >> >>> > >> > > suggestion
> > >> >> >>> > >> > > >> > with
> > >> >> >>> > >> > > >> > > > > > Roger's
> > >> >> >>> > >> > > >> > > > > > >> >> > tweak: add a namespace for
> > >> headers.
> > >> >> >>> This
> > >> >> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message