kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Koshy <jjkosh...@gmail.com>
Subject Re: [DISCUSS] KIP-31 - Message format change proposal
Date Thu, 01 Oct 2015 01:26:58 GMT
The Phase 2 2.* sub-steps don't seem to be right. Can you look over
that carefully? Also, "definitive" - you mean "absolute" i.e., not
relative offsets right?

One more thing that may be worth mentioning is that it is technically
possible to canary the new version format on at most one broker (or
multiple if it hosts mutually disjoint partitions). Basically turn on
the new message format on one broker, leave it on for an extended
period - if we hit some unanticipated bug and something goes terribly
wrong with the feature then just kill that broker, switch it to the v0
on-disk format and reseed it from the leaders. Most people may not
want to have such a long deployment plan but at least it is an option
for those who want to tread very carefully given that it is backwards
incompatible.

Joel

On Tue, Sep 29, 2015 at 4:50 PM, Jiangjie Qin <jqin@linkedin.com.invalid> wrote:
> Hi Joel and other folks.
>
> I updated the KIP page with the two phase roll out, which avoids the
> conversion for majority of users.
>
> To do that we need to add a message.format.version configuration to broker.
> Other than that there is no interface change from the previous proposal.
> Please let me know if you have concern about the updated proposal.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Sep 25, 2015 at 11:26 AM, Joel Koshy <jjkoshy.w@gmail.com> wrote:
>
>> Hey Becket,
>>
>> I do think we need the interim deployment phase, set
>> message.format.version and down-convert for producer request v2.
>> Down-conversion for v2 is no worse than what the broker is doing now.
>> I don't think we want a prolonged phase where we down-convert for
>> every v1 fetch - in fact I'm less concerned about losing zero-copy for
>> those fetch requests than the overhead of decompress/recompress for
>> those fetches as that would increase your CPU usage by 4x, 5x or
>> whatever the average consumer fan-out is. The
>> decompression/recompression will put further memory pressure as well.
>>
>> It is true that clients send the latest request version that it is
>> compiled with and that does not need to change. The broker can
>> continue to send back with zero-copy for fetch request version 2 as
>> well (even if during the interim phase during which it down-converts
>> producer request v2). The consumer iterator (for old consumer) or the
>> Fetcher (for new consumer) needs to be able to handle messages that
>> are in original as well as new (relative offset) format.
>>
>> Thanks,
>>
>> Joel
>>
>>
>> On Thu, Sep 24, 2015 at 7:56 PM, Jiangjie Qin <jqin@linkedin.com.invalid>
>> wrote:
>> > Hi Joel,
>> >
>> > That is a valid concern. And that is actually why we had the
>> > message.format.version before.
>> >
>> > My original thinking was:
>> > 1. upgrade the broker to support both V1 and V2 for consumer/producer
>> > request.
>> > 2. configure broker to store V1 on the disk. (message.format.version = 1)
>> > 3. upgrade the consumer to support both V1 and V2 for consumer request.
>> > 4. Meanwhile some producer might also be upgraded to use producer request
>> > V2.
>> > 5. At this point, for producer request V2, broker will do down
>> conversion.
>> > Regardless consumers are upgraded or not, broker will always use
>> zero-copy
>> > transfer. Because supposedly both old and upgraded consumer should be
>> able
>> > to understand that.
>> > 6. After most of the consumers are upgraded, We set
>> message.format.version
>> > = 1 and only do down conversion for old consumers.
>> >
>> > This way we don't need to reject producer request V2. And we always to
>> > version conversion for the minority of the consumers. However I have a
>> few
>> > concerns over this approach, not sure if they actually matters.
>> >
>> > A. (5) is not true for now. Today the clients only uses the highest
>> > version, i.e. a producer/consumer wouldn't parse a lower version of
>> > response even the code exist there. I think supposedly, consumer should
>> > stick to one version and broker should do the conversion.
>> > B. Let's say (A) is not a concern, we make all the clients support all
>> the
>> > versions it knows. At step(6), there will be a transitional period that
>> > user will see both messages with new and old version. For KIP-31 only it
>> > might be OK because we are not adding anything into the message. But if
>> the
>> > message has different fields (e.g. KIP-32), that means people will get
>> > those fields from some messages but not from some other messages. Would
>> > that be a problem?
>> >
>> > If (A) and (B) are not a problem. Is the above procedure able to address
>> > your concern?
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Thu, Sep 24, 2015 at 6:32 PM, Joel Koshy <jjkoshy.w@gmail.com> wrote:
>> >
>> >> The upgrade plan works, but the potentially long interim phase of
>> >> skipping zero-copy for down-conversion could be problematic especially
>> >> for large deployments with large consumer fan-out. It is not only
>> >> going to be memory overhead but CPU as well - since you need to
>> >> decompress, write absolute offsets, then recompress for every v1
>> >> fetch. i.e., it may be safer (but obviously more tedious) to have a
>> >> multi-step upgrade process. For e.g.,:
>> >>
>> >> 1 - Upgrade brokers, but disable the feature. i.e., either reject
>> >> producer requests v2 or down-convert to old message format (with
>> >> absolute offsets)
>> >> 2 - Upgrade clients, but they should only use v1 requests
>> >> 3 - Switch (all or most) consumers to use v2 fetch format (which will
>> >> use zero-copy).
>> >> 4 - Turn on the feature on the brokers to allow producer requests v2
>> >> 5 - Switch producers to use v2 produce format
>> >>
>> >> (You may want a v1 fetch rate metric and decide to proceed to step 4
>> >> only when that comes down to a trickle)
>> >>
>> >> I'm not sure if the prolonged upgrade process is viable in every
>> >> scenario. I think it should work at LinkedIn for e.g., but may not for
>> >> other environments.
>> >>
>> >> Joel
>> >>
>> >>
>> >> On Tue, Sep 22, 2015 at 12:55 AM, Jiangjie Qin
>> >> <jqin@linkedin.com.invalid> wrote:
>> >> > Thanks for the explanation, Jay.
>> >> > Agreed. We have to keep the offset to be the offset of last inner
>> >> message.
>> >> >
>> >> > Jiangjie (Becket) Qin
>> >> >
>> >> > On Mon, Sep 21, 2015 at 6:21 PM, Jay Kreps <jay@confluent.io> wrote:
>> >> >
>> >> >> For (3) I don't think we can change the offset in the outer message
>> from
>> >> >> what it is today as it is relied upon in the search done in the log
>> >> layer.
>> >> >> The reason it is the offset of the last message rather than the first
>> >> is to
>> >> >> make the offset a least upper bound (i.e. the smallest offset >=
>> >> >> fetch_offset). This needs to work the same for both gaps due to
>> >> compacted
>> >> >> topics and gaps due to compressed messages.
>> >> >>
>> >> >> So imagine you had a compressed set with offsets {45, 46, 47, 48} if
>> you
>> >> >> assigned this compressed set the offset 45 a fetch for 46 would
>> actually
>> >> >> skip ahead to 49 (the least upper bound).
>> >> >>
>> >> >> -Jay
>> >> >>
>> >> >> On Mon, Sep 21, 2015 at 5:17 PM, Jun Rao <jun@confluent.io> wrote:
>> >> >>
>> >> >> > Jiangjie,
>> >> >> >
>> >> >> > Thanks for the writeup. A few comments below.
>> >> >> >
>> >> >> > 1. We will need to be a bit careful with fetch requests from the
>> >> >> followers.
>> >> >> > Basically, as we are doing a rolling upgrade of the brokers, the
>> >> follower
>> >> >> > can't start issuing V2 of the fetch request until the rest of the
>> >> brokers
>> >> >> > are ready to process it. So, we probably need to make use of
>> >> >> > inter.broker.protocol.version to do the rolling upgrade. In step
>> 1, we
>> >> >> set
>> >> >> > inter.broker.protocol.version to 0.9 and do a round of rolling
>> >> upgrade of
>> >> >> > the brokers. At this point, all brokers are capable of processing
>> V2
>> >> of
>> >> >> > fetch requests, but no broker is using it yet. In step 2, we
>> >> >> > set inter.broker.protocol.version to 0.10 and do another round of
>> >> rolling
>> >> >> > restart of the brokers. In this step, the upgraded brokers will
>> start
>> >> >> > issuing V2 of the fetch request.
>> >> >> >
>> >> >> > 2. If we do #1, I am not sure if there is still a need for
>> >> >> > message.format.version since the broker can start writing messages
>> in
>> >> the
>> >> >> > new format after inter.broker.protocol.version is set to 0.10.
>> >> >> >
>> >> >> > 3. It wasn't clear from the wiki whether the base offset in the
>> >> shallow
>> >> >> > message is the offset of the first or the last inner message. It's
>> >> better
>> >> >> > to use the offset of the last inner message. This way, the
>> followers
>> >> >> don't
>> >> >> > have to decompress messages to figure out the next fetch offset.
>> >> >> >
>> >> >> > 4. I am not sure that I understand the following sentence in the
>> >> wiki. It
>> >> >> > seems that the relative offsets in a compressed message don't have
>> to
>> >> be
>> >> >> > consecutive. If so, why do we need to update the relative offsets
>> in
>> >> the
>> >> >> > inner messages?
>> >> >> > "When the log cleaner compacts log segments, it needs to update the
>> >> inner
>> >> >> > message's relative offset values."
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > Jun
>> >> >> >
>> >> >> > On Thu, Sep 17, 2015 at 12:54 PM, Jiangjie Qin
>> >> <jqin@linkedin.com.invalid
>> >> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi folks,
>> >> >> > >
>> >> >> > > Thanks a lot for the feedback on KIP-31 - move to use relative
>> >> offset.
>> >> >> > (Not
>> >> >> > > including timestamp and index discussion).
>> >> >> > >
>> >> >> > > I updated the migration plan section as we discussed on KIP
>> >> hangout. I
>> >> >> > > think it is the only concern raised so far. Please let me know if
>> >> there
>> >> >> > are
>> >> >> > > further comments about the KIP.
>> >> >> > >
>> >> >> > > Thanks,
>> >> >> > >
>> >> >> > > Jiangjie (Becket) Qin
>> >> >> > >
>> >> >> > > On Mon, Sep 14, 2015 at 5:13 PM, Jiangjie Qin <jqin@linkedin.com
>> >
>> >> >> wrote:
>> >> >> > >
>> >> >> > > > I just updated the KIP-33 to explain the indexing on CreateTime
>> >> and
>> >> >> > > > LogAppendTime respectively. I also used some use case to
>> compare
>> >> the
>> >> >> > two
>> >> >> > > > solutions.
>> >> >> > > > Although this is for KIP-33, but it does give a some insights
>> on
>> >> >> > whether
>> >> >> > > > it makes sense to have a per message LogAppendTime.
>> >> >> > > >
>> >> >> > > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
>> >> >> > > >
>> >> >> > > > As a short summary of the conclusions we have already reached
>> on
>> >> >> > > timestamp:
>> >> >> > > > 1. It is good to add a timestamp to the message.
>> >> >> > > > 2. LogAppendTime should be used for broker policy enforcement
>> (Log
>> >> >> > > > retention / rolling)
>> >> >> > > > 3. It is useful to have a CreateTime in message format, which
>> is
>> >> >> > > immutable
>> >> >> > > > after producer sends the message.
>> >> >> > > >
>> >> >> > > > There are following questions still in discussion:
>> >> >> > > > 1. Should we also add LogAppendTime to message format?
>> >> >> > > > 2. which timestamp should we use to build the index.
>> >> >> > > >
>> >> >> > > > Let's talk about question 1 first because question 2 is
>> actually a
>> >> >> > follow
>> >> >> > > > up question for question 1.
>> >> >> > > > Here are what I think:
>> >> >> > > > 1a. To enforce broker log policy, theoretically we don't need
>> >> >> > per-message
>> >> >> > > > LogAppendTime. If we don't include LogAppendTime in message, we
>> >> still
>> >> >> > > need
>> >> >> > > > to implement a separate solution to pass log segment timestamps
>> >> among
>> >> >> > > > brokers. That means if we don't include the LogAppendTime in
>> >> message,
>> >> >> > > there
>> >> >> > > > will be further complication in replication.
>> >> >> > > > 1b. LogAppendTime has some advantage over CreateTime (KIP-33
>> has
>> >> >> detail
>> >> >> > > > comparison)
>> >> >> > > > 1c. We have already exposed offset, which is essentially an
>> >> internal
>> >> >> > > > concept of message in terms of position. Exposing LogAppendTime
>> >> means
>> >> >> > we
>> >> >> > > > expose another internal concept of message in terms of time.
>> >> >> > > >
>> >> >> > > > Considering the above reasons, personally I think it worth
>> adding
>> >> the
>> >> >> > > > LogAppendTime to each message.
>> >> >> > > >
>> >> >> > > > Any thoughts?
>> >> >> > > >
>> >> >> > > > Thanks,
>> >> >> > > >
>> >> >> > > > Jiangjie (Becket) Qin
>> >> >> > > >
>> >> >> > > > On Mon, Sep 14, 2015 at 11:44 AM, Jiangjie Qin <
>> jqin@linkedin.com
>> >> >
>> >> >> > > wrote:
>> >> >> > > >
>> >> >> > > >> I was trying to send last email before KIP hangout so maybe
>> did
>> >> not
>> >> >> > > think
>> >> >> > > >> it through completely. By the way, the discussion is actually
>> >> more
>> >> >> > > related
>> >> >> > > >> to KIP-33, i.e. whether we should index on CreateTime or
>> >> >> > LogAppendTime.
>> >> >> > > >> (Although it seems all the discussion are still in this
>> mailing
>> >> >> > > thread...)
>> >> >> > > >> This solution in last email is for indexing on CreateTime. It
>> is
>> >> >> > > >> essentially what Jay suggested except we use a timestamp map
>> >> instead
>> >> >> > of
>> >> >> > > a
>> >> >> > > >> memory mapped index file. Please ignore the proposal of using
>> a
>> >> log
>> >> >> > > >> compacted topic. The solution can be simplified to:
>> >> >> > > >>
>> >> >> > > >> Each broker keeps
>> >> >> > > >> 1. a timestamp index map - Map[TopicPartitionSegment,
>> >> Map[Timestamp,
>> >> >> > > >> Offset]]. The timestamp is on minute boundary.
>> >> >> > > >> 2. A timestamp index file for each segment.
>> >> >> > > >> When a broker receives a message (both leader or follower), it
>> >> >> checks
>> >> >> > if
>> >> >> > > >> the timestamp index map contains the timestamp for current
>> >> segment.
>> >> >> > The
>> >> >> > > >> broker add the offset to the map and append an entry to the
>> >> >> timestamp
>> >> >> > > index
>> >> >> > > >> if the timestamp does not exist. i.e. we only use the index
>> file
>> >> as
>> >> >> a
>> >> >> > > >> persistent copy of the index timestamp map.
>> >> >> > > >>
>> >> >> > > >> When a log segment is deleted, we need to:
>> >> >> > > >> 1. delete the TopicPartitionKeySegment key in the timestamp
>> index
>> >> >> map.
>> >> >> > > >> 2. delete the timestamp index file
>> >> >> > > >>
>> >> >> > > >> This solution assumes we only keep CreateTime in the message.
>> >> There
>> >> >> > are
>> >> >> > > a
>> >> >> > > >> few trade-offs in this solution:
>> >> >> > > >> 1. The granularity of search will be per minute.
>> >> >> > > >> 2. All the timestamp index map has to be in the memory all the
>> >> time.
>> >> >> > > >> 3. We need to think about another way to honor log retention
>> time
>> >> >> and
>> >> >> > > >> time-based log rolling.
>> >> >> > > >> 4. We lose the benefit brought by including LogAppendTime in
>> the
>> >> >> > message
>> >> >> > > >> mentioned earlier.
>> >> >> > > >>
>> >> >> > > >> I am not sure whether this solution is necessarily better than
>> >> >> > indexing
>> >> >> > > >> on LogAppendTime.
>> >> >> > > >>
>> >> >> > > >> I will update KIP-33 to explain the solution to index on
>> >> CreateTime
>> >> >> > and
>> >> >> > > >> LogAppendTime respectively and put some more concrete use
>> cases
>> >> as
>> >> >> > well.
>> >> >> > > >>
>> >> >> > > >> Thanks,
>> >> >> > > >>
>> >> >> > > >> Jiangjie (Becket) Qin
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >> On Mon, Sep 14, 2015 at 9:40 AM, Jiangjie Qin <
>> jqin@linkedin.com
>> >> >
>> >> >> > > wrote:
>> >> >> > > >>
>> >> >> > > >>> Hi Joel,
>> >> >> > > >>>
>> >> >> > > >>> Good point about rebuilding index. I agree that having a per
>> >> >> message
>> >> >> > > >>> LogAppendTime might be necessary. About time adjustment, the
>> >> >> solution
>> >> >> > > >>> sounds promising, but it might be better to make it as a
>> follow
>> >> up
>> >> >> of
>> >> >> > > the
>> >> >> > > >>> KIP because it seems a really rare use case.
>> >> >> > > >>>
>> >> >> > > >>> I have another thought on how to manage the out of order
>> >> >> timestamps.
>> >> >> > > >>> Maybe we can do the following:
>> >> >> > > >>> Create a special log compacted topic __timestamp_index
>> similar
>> >> to
>> >> >> > > topic,
>> >> >> > > >>> the key would be (TopicPartition,
>> TimeStamp_Rounded_To_Minute),
>> >> the
>> >> >> > > value
>> >> >> > > >>> is offset. In memory, we keep a map for each TopicPartition,
>> the
>> >> >> > value
>> >> >> > > is
>> >> >> > > >>> (timestamp_rounded_to_minute ->
>> smallest_offset_in_the_minute).
>> >> >> This
>> >> >> > > way we
>> >> >> > > >>> can search out of order message and make sure no message is
>> >> >> missing.
>> >> >> > > >>>
>> >> >> > > >>> Thoughts?
>> >> >> > > >>>
>> >> >> > > >>> Thanks,
>> >> >> > > >>>
>> >> >> > > >>> Jiangjie (Becket) Qin
>> >> >> > > >>>
>> >> >> > > >>> On Fri, Sep 11, 2015 at 12:46 PM, Joel Koshy <
>> >> jjkoshy.w@gmail.com>
>> >> >> > > >>> wrote:
>> >> >> > > >>>
>> >> >> > > >>>> Jay had mentioned the scenario of mirror-maker bootstrap
>> which
>> >> >> would
>> >> >> > > >>>> effectively reset the logAppendTimestamps for the
>> bootstrapped
>> >> >> data.
>> >> >> > > >>>> If we don't include logAppendTimestamps in each message
>> there
>> >> is a
>> >> >> > > >>>> similar scenario when rebuilding indexes during recovery.
>> So it
>> >> >> > seems
>> >> >> > > >>>> it may be worth adding that timestamp to messages. The
>> >> drawback to
>> >> >> > > >>>> that is exposing a server-side concept in the protocol
>> >> (although
>> >> >> we
>> >> >> > > >>>> already do that with offsets). logAppendTimestamp really
>> >> should be
>> >> >> > > >>>> decided by the broker so I think the first scenario may have
>> >> to be
>> >> >> > > >>>> written off as a gotcha, but the second may be worth
>> addressing
>> >> >> (by
>> >> >> > > >>>> adding it to the message format).
>> >> >> > > >>>>
>> >> >> > > >>>> The other point that Jay raised which needs to be addressed
>> >> (since
>> >> >> > we
>> >> >> > > >>>> require monotically increasing timestamps in the index) in
>> the
>> >> >> > > >>>> proposal is changing time on the server (I'm a little less
>> >> >> concerned
>> >> >> > > >>>> about NTP clock skews than a user explicitly changing the
>> >> server's
>> >> >> > > >>>> time - i.e., big clock skews). We would at least want to
>> "set
>> >> >> back"
>> >> >> > > >>>> all the existing timestamps to guarantee non-decreasing
>> >> timestamps
>> >> >> > > >>>> with future messages. I'm not sure at this point how best to
>> >> >> handle
>> >> >> > > >>>> that, but we could perhaps have a epoch/base-time (or
>> >> >> > time-correction)
>> >> >> > > >>>> stored in the log directories and base all log index
>> timestamps
>> >> >> off
>> >> >> > > >>>> that base-time (or corrected). So if at any time you
>> determine
>> >> >> that
>> >> >> > > >>>> time has changed backwards you can adjust that base-time
>> >> without
>> >> >> > > >>>> having to fix up all the entries. Without knowing the exact
>> >> diff
>> >> >> > > >>>> between the previous clock and new clock we cannot adjust
>> the
>> >> >> times
>> >> >> > > >>>> exactly, but we can at least ensure increasing timestamps.
>> >> >> > > >>>>
>> >> >> > > >>>> On Fri, Sep 11, 2015 at 10:52 AM, Jiangjie Qin
>> >> >> > > >>>> <jqin@linkedin.com.invalid> wrote:
>> >> >> > > >>>> > Ewen and Jay,
>> >> >> > > >>>> >
>> >> >> > > >>>> > They way I see the LogAppendTime is another format of
>> >> "offset".
>> >> >> It
>> >> >> > > >>>> serves
>> >> >> > > >>>> > the following purpose:
>> >> >> > > >>>> > 1. Locate messages not only by position, but also by time.
>> >> The
>> >> >> > > >>>> difference
>> >> >> > > >>>> > from offset is timestamp is not unique for all messags.
>> >> >> > > >>>> > 2. Allow broker to manage messages based on time, e.g.
>> >> >> retention,
>> >> >> > > >>>> rolling
>> >> >> > > >>>> > 3. Provide convenience for user to search message not
>> only by
>> >> >> > > offset,
>> >> >> > > >>>> but
>> >> >> > > >>>> > also by timestamp.
>> >> >> > > >>>> >
>> >> >> > > >>>> > For purpose (2) we don't need per message server
>> timestamp.
>> >> We
>> >> >> > only
>> >> >> > > >>>> need
>> >> >> > > >>>> > per log segment server timestamp and propagate it among
>> >> brokers.
>> >> >> > > >>>> >
>> >> >> > > >>>> > For (1) and (3), we need per message timestamp. Then the
>> >> >> question
>> >> >> > is
>> >> >> > > >>>> > whether we should use CreateTime or LogAppendTime?
>> >> >> > > >>>> >
>> >> >> > > >>>> > I completely agree that an application timestamp is very
>> >> useful
>> >> >> > for
>> >> >> > > >>>> many
>> >> >> > > >>>> > use cases. But it seems to me that having Kafka to
>> understand
>> >> >> and
>> >> >> > > >>>> maintain
>> >> >> > > >>>> > application timestamp is a bit over demanding. So I think
>> >> there
>> >> >> is
>> >> >> > > >>>> value to
>> >> >> > > >>>> > pass on CreateTime for application convenience, but I am
>> not
>> >> >> sure
>> >> >> > it
>> >> >> > > >>>> can
>> >> >> > > >>>> > replace LogAppendTime. Managing out-of-order CreateTime is
>> >> >> > > equivalent
>> >> >> > > >>>> to
>> >> >> > > >>>> > allowing producer to send their own offset and ask broker
>> to
>> >> >> > manage
>> >> >> > > >>>> the
>> >> >> > > >>>> > offset for them, It is going to be very hard to maintain
>> and
>> >> >> could
>> >> >> > > >>>> create
>> >> >> > > >>>> > huge performance/functional issue because of complicated
>> >> logic.
>> >> >> > > >>>> >
>> >> >> > > >>>> > About whether we should expose LogAppendTime to broker, I
>> >> agree
>> >> >> > that
>> >> >> > > >>>> server
>> >> >> > > >>>> > timestamp is internal to broker, but isn't offset also an
>> >> >> internal
>> >> >> > > >>>> concept?
>> >> >> > > >>>> > Arguably it's not provided by producer so consumer
>> >> application
>> >> >> > logic
>> >> >> > > >>>> does
>> >> >> > > >>>> > not have to know offset. But user needs to know offset
>> >> because
>> >> >> > they
>> >> >> > > >>>> need to
>> >> >> > > >>>> > know "where is the message" in the log. LogAppendTime
>> >> provides
>> >> >> the
>> >> >> > > >>>> answer
>> >> >> > > >>>> > of "When was the message appended" to the log. So
>> personally
>> >> I
>> >> >> > think
>> >> >> > > >>>> it is
>> >> >> > > >>>> > reasonable to expose the LogAppendTime to consumers.
>> >> >> > > >>>> >
>> >> >> > > >>>> > I can see some use cases of exposing the LogAppendTime, to
>> >> name
>> >> >> > > some:
>> >> >> > > >>>> > 1. Let's say broker has 7 days of log retention, some
>> >> >> application
>> >> >> > > >>>> wants to
>> >> >> > > >>>> > reprocess the data in past 3 days. User can simply provide
>> >> the
>> >> >> > > >>>> timestamp
>> >> >> > > >>>> > and start consume.
>> >> >> > > >>>> > 2. User can easily know lag by time.
>> >> >> > > >>>> > 3. Cross cluster fail over. This is a more complicated use
>> >> case,
>> >> >> > > >>>> there are
>> >> >> > > >>>> > two goals: 1) Not lose message; and 2) do not reconsume
>> tons
>> >> of
>> >> >> > > >>>> messages.
>> >> >> > > >>>> > Only knowing offset of cluster A won't help with finding
>> fail
>> >> >> over
>> >> >> > > >>>> point in
>> >> >> > > >>>> > cluster B  because an offset of a cluster means nothing to
>> >> >> another
>> >> >> > > >>>> cluster.
>> >> >> > > >>>> > Timestamp however is a good cross cluster reference in
>> this
>> >> >> case.
>> >> >> > > >>>> >
>> >> >> > > >>>> > Thanks,
>> >> >> > > >>>> >
>> >> >> > > >>>> > Jiangjie (Becket) Qin
>> >> >> > > >>>> >
>> >> >> > > >>>> > On Thu, Sep 10, 2015 at 9:28 PM, Ewen Cheslack-Postava <
>> >> >> > > >>>> ewen@confluent.io>
>> >> >> > > >>>> > wrote:
>> >> >> > > >>>> >
>> >> >> > > >>>> >> Re: MM preserving timestamps: Yes, this was how I
>> >> interpreted
>> >> >> the
>> >> >> > > >>>> point in
>> >> >> > > >>>> >> the KIP and I only raised the issue because it restricts
>> the
>> >> >> > > >>>> usefulness of
>> >> >> > > >>>> >> timestamps anytime MM is involved. I agree it's not a
>> deal
>> >> >> > breaker,
>> >> >> > > >>>> but I
>> >> >> > > >>>> >> wanted to understand exact impact of the change. Some
>> users
>> >> >> seem
>> >> >> > to
>> >> >> > > >>>> want to
>> >> >> > > >>>> >> be able to seek by application-defined timestamps
>> (despite
>> >> the
>> >> >> > many
>> >> >> > > >>>> obvious
>> >> >> > > >>>> >> issues involved), and the proposal clearly would not
>> support
>> >> >> that
>> >> >> > > >>>> unless
>> >> >> > > >>>> >> the timestamps submitted with the produce requests were
>> >> >> > respected.
>> >> >> > > >>>> If we
>> >> >> > > >>>> >> ignore client submitted timestamps, then we probably
>> want to
>> >> >> try
>> >> >> > to
>> >> >> > > >>>> hide
>> >> >> > > >>>> >> the timestamps as much as possible in any public
>> interface
>> >> >> (e.g.
>> >> >> > > >>>> never
>> >> >> > > >>>> >> shows up in any public consumer APIs), but expose it just
>> >> >> enough
>> >> >> > to
>> >> >> > > >>>> be
>> >> >> > > >>>> >> useful for operational purposes.
>> >> >> > > >>>> >>
>> >> >> > > >>>> >> Sorry if my devil's advocate position / attempt to map
>> the
>> >> >> design
>> >> >> > > >>>> space led
>> >> >> > > >>>> >> to some confusion!
>> >> >> > > >>>> >>
>> >> >> > > >>>> >> -Ewen
>> >> >> > > >>>> >>
>> >> >> > > >>>> >>
>> >> >> > > >>>> >> On Thu, Sep 10, 2015 at 5:48 PM, Jay Kreps <
>> >> jay@confluent.io>
>> >> >> > > wrote:
>> >> >> > > >>>> >>
>> >> >> > > >>>> >> > Ah, I see, I think I misunderstood about MM, it was
>> called
>> >> >> out
>> >> >> > in
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > proposal and I thought you were saying you'd retain the
>> >> >> > timestamp
>> >> >> > > >>>> but I
>> >> >> > > >>>> >> > think you're calling out that you're not. In that case
>> >> you do
>> >> >> > > have
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > opposite problem, right? When you add mirroring for a
>> >> topic
>> >> >> all
>> >> >> > > >>>> that data
>> >> >> > > >>>> >> > will have a timestamp of now and retention won't be
>> right.
>> >> >> Not
>> >> >> > a
>> >> >> > > >>>> blocker
>> >> >> > > >>>> >> > but a bit of a gotcha.
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >> > -Jay
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >> > On Thu, Sep 10, 2015 at 5:40 PM, Joel Koshy <
>> >> >> > jjkoshy.w@gmail.com
>> >> >> > > >
>> >> >> > > >>>> wrote:
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >> > > > Don't you see all the same issues you see with
>> >> >> > client-defined
>> >> >> > > >>>> >> > timestamp's
>> >> >> > > >>>> >> > > > if you let mm control the timestamp as you were
>> >> >> proposing?
>> >> >> > > >>>> That means
>> >> >> > > >>>> >> > > time
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > Actually I don't think that was in the proposal (or
>> was
>> >> >> it?).
>> >> >> > > >>>> i.e., I
>> >> >> > > >>>> >> > > think it was always supposed to be controlled by the
>> >> broker
>> >> >> > > (and
>> >> >> > > >>>> not
>> >> >> > > >>>> >> > > MM).
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > > Also, Joel, can you just confirm that you guys have
>> >> >> talked
>> >> >> > > >>>> through
>> >> >> > > >>>> >> the
>> >> >> > > >>>> >> > > > whole timestamp thing with the Samza folks at LI?
>> The
>> >> >> > reason
>> >> >> > > I
>> >> >> > > >>>> ask
>> >> >> > > >>>> >> > about
>> >> >> > > >>>> >> > > > this is that Samza and Kafka Streams (KIP-28) are
>> both
>> >> >> > trying
>> >> >> > > >>>> to rely
>> >> >> > > >>>> >> > on
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > We have not. This is a good point - we will
>> follow-up.
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > > WRT your idea of a FollowerFetchRequestI had
>> thought
>> >> of a
>> >> >> > > >>>> similar
>> >> >> > > >>>> >> idea
>> >> >> > > >>>> >> > > > where we use the leader's timestamps to
>> approximately
>> >> set
>> >> >> > the
>> >> >> > > >>>> >> > follower's
>> >> >> > > >>>> >> > > > timestamps. I had thought of just adding a
>> partition
>> >> >> > metadata
>> >> >> > > >>>> request
>> >> >> > > >>>> >> > > that
>> >> >> > > >>>> >> > > > would subsume the current offset/time lookup and
>> >> could be
>> >> >> > > used
>> >> >> > > >>>> by the
>> >> >> > > >>>> >> > > > follower to try to approximately keep their
>> timestamps
>> >> >> > > kosher.
>> >> >> > > >>>> It's a
>> >> >> > > >>>> >> > > > little hacky and doesn't help with MM but it is
>> also
>> >> >> maybe
>> >> >> > > less
>> >> >> > > >>>> >> > invasive
>> >> >> > > >>>> >> > > so
>> >> >> > > >>>> >> > > > that approach could be viable.
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > That would also work, but perhaps responding with the
>> >> >> actual
>> >> >> > > >>>> leader
>> >> >> > > >>>> >> > > offset-timestamp entries (corresponding to the
>> fetched
>> >> >> > portion)
>> >> >> > > >>>> would
>> >> >> > > >>>> >> > > be exact and it should be small as well. Anyway, the
>> >> main
>> >> >> > > >>>> motivation
>> >> >> > > >>>> >> > > in this was to avoid leaking server-side timestamps
>> to
>> >> the
>> >> >> > > >>>> >> > > message-format if people think it is worth it so the
>> >> >> > > >>>> alternatives are
>> >> >> > > >>>> >> > > implementation details. My original instinct was
>> that it
>> >> >> also
>> >> >> > > >>>> avoids a
>> >> >> > > >>>> >> > > backwards incompatible change (but it does not
>> because
>> >> we
>> >> >> > also
>> >> >> > > >>>> have
>> >> >> > > >>>> >> > > the relative offset change).
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > Thanks,
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > Joel
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > >
>> >> >> > > >>>> >> > > >
>> >> >> > > >>>> >> > > >
>> >> >> > > >>>> >> > > > On Thu, Sep 10, 2015 at 3:36 PM, Joel Koshy <
>> >> >> > > >>>> jjkoshy.w@gmail.com>
>> >> >> > > >>>> >> > wrote:
>> >> >> > > >>>> >> > > >
>> >> >> > > >>>> >> > > >> I just wanted to comment on a few points made
>> >> earlier in
>> >> >> > > this
>> >> >> > > >>>> >> thread:
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> Concerns on clock skew: at least for the original
>> >> >> > proposal's
>> >> >> > > >>>> scope
>> >> >> > > >>>> >> > > >> (which was more for honoring retention
>> broker-side)
>> >> this
>> >> >> > > >>>> would only
>> >> >> > > >>>> >> be
>> >> >> > > >>>> >> > > >> an issue when spanning leader movements right?
>> i.e.,
>> >> >> > leader
>> >> >> > > >>>> >> migration
>> >> >> > > >>>> >> > > >> latency has to be much less than clock skew for
>> this
>> >> to
>> >> >> > be a
>> >> >> > > >>>> real
>> >> >> > > >>>> >> > > >> issue wouldn’t it?
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> Client timestamp vs broker timestamp: I’m not sure
>> >> Kafka
>> >> >> > > >>>> (brokers)
>> >> >> > > >>>> >> are
>> >> >> > > >>>> >> > > >> the right place to reason about client-side
>> >> timestamps
>> >> >> > > >>>> precisely due
>> >> >> > > >>>> >> > > >> to the nuances that have been discussed at length
>> in
>> >> >> this
>> >> >> > > >>>> thread. My
>> >> >> > > >>>> >> > > >> preference would have been to the timestamp (now
>> >> called
>> >> >> > > >>>> >> > > >> LogAppendTimestamp) have nothing to do with the
>> >> >> > > applications.
>> >> >> > > >>>> Ewen
>> >> >> > > >>>> >> > > >> raised a valid concern about leaking such
>> >> >> > > >>>> “private/server-side”
>> >> >> > > >>>> >> > > >> timestamps into the protocol spec. i.e., it is
>> fine
>> >> to
>> >> >> > have
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > > >> CreateTime which is expressly client-provided and
>> >> >> > immutable
>> >> >> > > >>>> >> > > >> thereafter, but the LogAppendTime is also going
>> part
>> >> of
>> >> >> > the
>> >> >> > > >>>> protocol
>> >> >> > > >>>> >> > > >> and it would be good to avoid exposure (to client
>> >> >> > > developers)
>> >> >> > > >>>> if
>> >> >> > > >>>> >> > > >> possible. Ok, so here is a slightly different
>> >> approach
>> >> >> > that
>> >> >> > > I
>> >> >> > > >>>> was
>> >> >> > > >>>> >> just
>> >> >> > > >>>> >> > > >> thinking about (and did not think too far so it
>> may
>> >> not
>> >> >> > > >>>> work): do
>> >> >> > > >>>> >> not
>> >> >> > > >>>> >> > > >> add the LogAppendTime to messages. Instead, build
>> the
>> >> >> > > >>>> time-based
>> >> >> > > >>>> >> index
>> >> >> > > >>>> >> > > >> on the server side on message arrival time alone.
>> >> >> > Introduce
>> >> >> > > a
>> >> >> > > >>>> new
>> >> >> > > >>>> >> > > >> ReplicaFetchRequest/Response pair.
>> >> ReplicaFetchResponses
>> >> >> > > will
>> >> >> > > >>>> also
>> >> >> > > >>>> >> > > >> include the slice of the time-based index for the
>> >> >> follower
>> >> >> > > >>>> broker.
>> >> >> > > >>>> >> > > >> This way we can at least keep timestamps aligned
>> >> across
>> >> >> > > >>>> brokers for
>> >> >> > > >>>> >> > > >> retention purposes. We do lose the append
>> timestamp
>> >> for
>> >> >> > > >>>> mirroring
>> >> >> > > >>>> >> > > >> pipelines (which appears to be the case in KIP-32
>> as
>> >> >> > well).
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> Configurable index granularity: We can do this but
>> >> I’m
>> >> >> not
>> >> >> > > >>>> sure it
>> >> >> > > >>>> >> is
>> >> >> > > >>>> >> > > >> very useful and as Jay noted, a major change from
>> the
>> >> >> old
>> >> >> > > >>>> proposal
>> >> >> > > >>>> >> > > >> linked from the KIP is the sparse time-based index
>> >> which
>> >> >> > we
>> >> >> > > >>>> felt was
>> >> >> > > >>>> >> > > >> essential to bound memory usage (and having
>> >> timestamps
>> >> >> on
>> >> >> > > >>>> each log
>> >> >> > > >>>> >> > > >> index entry was probably a big waste since in the
>> >> common
>> >> >> > > case
>> >> >> > > >>>> >> several
>> >> >> > > >>>> >> > > >> messages span the same timestamp). BTW another
>> >> benefit
>> >> >> of
>> >> >> > > the
>> >> >> > > >>>> second
>> >> >> > > >>>> >> > > >> index is that it makes it easier to roll-back or
>> >> throw
>> >> >> > away
>> >> >> > > if
>> >> >> > > >>>> >> > > >> necessary (vs. modifying the existing index
>> format) -
>> >> >> > > >>>> although that
>> >> >> > > >>>> >> > > >> obviously does not help with rolling back the
>> >> timestamp
>> >> >> > > >>>> change in
>> >> >> > > >>>> >> the
>> >> >> > > >>>> >> > > >> message format, but it is one less thing to worry
>> >> about.
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> Versioning: I’m not sure everyone is saying the
>> same
>> >> >> thing
>> >> >> > > >>>> wrt the
>> >> >> > > >>>> >> > > >> scope of this. There is the record format change,
>> >> but I
>> >> >> > also
>> >> >> > > >>>> think
>> >> >> > > >>>> >> > > >> this ties into all of the API versioning that we
>> >> already
>> >> >> > > have
>> >> >> > > >>>> in
>> >> >> > > >>>> >> > > >> Kafka. The current API versioning approach works
>> fine
>> >> >> for
>> >> >> > > >>>> >> > > >> upgrades/downgrades across official Kafka
>> releases,
>> >> but
>> >> >> > not
>> >> >> > > >>>> so well
>> >> >> > > >>>> >> > > >> between releases. (We almost got bitten by this at
>> >> >> > LinkedIn
>> >> >> > > >>>> with the
>> >> >> > > >>>> >> > > >> recent changes to various requests but were able
>> to
>> >> work
>> >> >> > > >>>> around
>> >> >> > > >>>> >> > > >> these.) We can clarify this in the follow-up KIP.
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> Thanks,
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> Joel
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > > >> On Thu, Sep 10, 2015 at 3:00 PM, Jiangjie Qin
>> >> >> > > >>>> >> > <jqin@linkedin.com.invalid
>> >> >> > > >>>> >> > > >
>> >> >> > > >>>> >> > > >> wrote:
>> >> >> > > >>>> >> > > >> > Hi Jay,
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> > I just changed the KIP title and updated the KIP
>> >> page.
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> > And yes, we are working on a general version
>> >> control
>> >> >> > > >>>> proposal to
>> >> >> > > >>>> >> > make
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> > protocol migration like this more smooth. I will
>> >> also
>> >> >> > > >>>> create a KIP
>> >> >> > > >>>> >> > for
>> >> >> > > >>>> >> > > >> that
>> >> >> > > >>>> >> > > >> > soon.
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> > Thanks,
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> > Jiangjie (Becket) Qin
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> > On Thu, Sep 10, 2015 at 2:21 PM, Jay Kreps <
>> >> >> > > >>>> jay@confluent.io>
>> >> >> > > >>>> >> > wrote:
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> >> Great, can we change the name to something
>> >> related to
>> >> >> > the
>> >> >> > > >>>> >> > > >> change--"KIP-31:
>> >> >> > > >>>> >> > > >> >> Move to relative offsets in compressed message
>> >> sets".
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >> >> Also you had mentioned before you were going to
>> >> >> expand
>> >> >> > on
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > > mechanics
>> >> >> > > >>>> >> > > >> of
>> >> >> > > >>>> >> > > >> >> handling these log format changes, right?
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >> >> -Jay
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >> >> On Thu, Sep 10, 2015 at 12:42 PM, Jiangjie Qin
>> >> >> > > >>>> >> > > >> <jqin@linkedin.com.invalid>
>> >> >> > > >>>> >> > > >> >> wrote:
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >> >> > Neha and Jay,
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > Thanks a lot for the feedback. Good point
>> about
>> >> >> > > >>>> splitting the
>> >> >> > > >>>> >> > > >> >> discussion. I
>> >> >> > > >>>> >> > > >> >> > have split the proposal to three KIPs and it
>> >> does
>> >> >> > make
>> >> >> > > >>>> each
>> >> >> > > >>>> >> > > discussion
>> >> >> > > >>>> >> > > >> >> more
>> >> >> > > >>>> >> > > >> >> > clear:
>> >> >> > > >>>> >> > > >> >> > KIP-31 - Message format change (Use relative
>> >> >> offset)
>> >> >> > > >>>> >> > > >> >> > KIP-32 - Add CreateTime and LogAppendTime to
>> >> Kafka
>> >> >> > > >>>> message
>> >> >> > > >>>> >> > > >> >> > KIP-33 - Build a time-based log index
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > KIP-33 can be a follow up KIP for KIP-32, so
>> we
>> >> can
>> >> >> > > >>>> discuss
>> >> >> > > >>>> >> about
>> >> >> > > >>>> >> > > >> KIP-31
>> >> >> > > >>>> >> > > >> >> > and KIP-32 first for now. I will create a
>> >> separate
>> >> >> > > >>>> discussion
>> >> >> > > >>>> >> > > thread
>> >> >> > > >>>> >> > > >> for
>> >> >> > > >>>> >> > > >> >> > KIP-32 and reply the concerns you raised
>> >> regarding
>> >> >> > the
>> >> >> > > >>>> >> timestamp.
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > So far it looks there is no objection to
>> KIP-31.
>> >> >> > Since
>> >> >> > > I
>> >> >> > > >>>> >> removed
>> >> >> > > >>>> >> > a
>> >> >> > > >>>> >> > > few
>> >> >> > > >>>> >> > > >> >> part
>> >> >> > > >>>> >> > > >> >> > from previous KIP and only left the relative
>> >> offset
>> >> >> > > >>>> proposal,
>> >> >> > > >>>> >> it
>> >> >> > > >>>> >> > > >> would be
>> >> >> > > >>>> >> > > >> >> > great if people can take another look to see
>> if
>> >> >> there
>> >> >> > > is
>> >> >> > > >>>> any
>> >> >> > > >>>> >> > > concerns.
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > Thanks,
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > Jiangjie (Becket) Qin
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > On Tue, Sep 8, 2015 at 1:28 PM, Neha
>> Narkhede <
>> >> >> > > >>>> >> neha@confluent.io
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> > > >> wrote:
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > Becket,
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > Nice write-up. Few thoughts -
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > I'd split up the discussion for simplicity.
>> >> Note
>> >> >> > that
>> >> >> > > >>>> you can
>> >> >> > > >>>> >> > > always
>> >> >> > > >>>> >> > > >> >> > group
>> >> >> > > >>>> >> > > >> >> > > several of these in one patch to reduce the
>> >> >> > protocol
>> >> >> > > >>>> changes
>> >> >> > > >>>> >> > > people
>> >> >> > > >>>> >> > > >> >> have
>> >> >> > > >>>> >> > > >> >> > to
>> >> >> > > >>>> >> > > >> >> > > deal with.This is just a suggestion, but I
>> >> think
>> >> >> > the
>> >> >> > > >>>> >> following
>> >> >> > > >>>> >> > > split
>> >> >> > > >>>> >> > > >> >> > might
>> >> >> > > >>>> >> > > >> >> > > make it easier to tackle the changes being
>> >> >> > proposed -
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > >    - Relative offsets
>> >> >> > > >>>> >> > > >> >> > >    - Introducing the concept of time
>> >> >> > > >>>> >> > > >> >> > >    - Time-based indexing (separate the
>> usage
>> >> of
>> >> >> the
>> >> >> > > >>>> timestamp
>> >> >> > > >>>> >> > > field
>> >> >> > > >>>> >> > > >> >> from
>> >> >> > > >>>> >> > > >> >> > >    how/whether we want to include a
>> timestamp
>> >> in
>> >> >> > the
>> >> >> > > >>>> message)
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > I'm a +1 on relative offsets, we should've
>> >> done
>> >> >> it
>> >> >> > > >>>> back when
>> >> >> > > >>>> >> we
>> >> >> > > >>>> >> > > >> >> > introduced
>> >> >> > > >>>> >> > > >> >> > > it. Other than reducing the CPU overhead,
>> this
>> >> >> will
>> >> >> > > >>>> also
>> >> >> > > >>>> >> reduce
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> >> > garbage
>> >> >> > > >>>> >> > > >> >> > > collection overhead on the brokers.
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > On the timestamp field, I generally agree
>> >> that we
>> >> >> > > >>>> should add
>> >> >> > > >>>> >> a
>> >> >> > > >>>> >> > > >> >> timestamp
>> >> >> > > >>>> >> > > >> >> > to
>> >> >> > > >>>> >> > > >> >> > > a Kafka message but I'm not quite sold on
>> how
>> >> >> this
>> >> >> > > KIP
>> >> >> > > >>>> >> suggests
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> >> > > timestamp be set. Will avoid repeating the
>> >> >> > downsides
>> >> >> > > >>>> of a
>> >> >> > > >>>> >> > broker
>> >> >> > > >>>> >> > > >> side
>> >> >> > > >>>> >> > > >> >> > > timestamp mentioned previously in this
>> >> thread. I
>> >> >> > > think
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > topic
>> >> >> > > >>>> >> > > of
>> >> >> > > >>>> >> > > >> >> > > including a timestamp in a Kafka message
>> >> >> requires a
>> >> >> > > >>>> lot more
>> >> >> > > >>>> >> > > thought
>> >> >> > > >>>> >> > > >> >> and
>> >> >> > > >>>> >> > > >> >> > > details than what's in this KIP. I'd
>> suggest
>> >> we
>> >> >> > make
>> >> >> > > >>>> it a
>> >> >> > > >>>> >> > > separate
>> >> >> > > >>>> >> > > >> KIP
>> >> >> > > >>>> >> > > >> >> > that
>> >> >> > > >>>> >> > > >> >> > > includes a list of all the different use
>> cases
>> >> >> for
>> >> >> > > the
>> >> >> > > >>>> >> > timestamp
>> >> >> > > >>>> >> > > >> >> (beyond
>> >> >> > > >>>> >> > > >> >> > > log retention) including stream processing
>> and
>> >> >> > > discuss
>> >> >> > > >>>> >> > tradeoffs
>> >> >> > > >>>> >> > > of
>> >> >> > > >>>> >> > > >> >> > > including client and broker side
>> timestamps.
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > Agree with the benefit of time-based
>> indexing,
>> >> >> but
>> >> >> > > >>>> haven't
>> >> >> > > >>>> >> had
>> >> >> > > >>>> >> > a
>> >> >> > > >>>> >> > > >> chance
>> >> >> > > >>>> >> > > >> >> > to
>> >> >> > > >>>> >> > > >> >> > > dive into the design details yet.
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > Thanks,
>> >> >> > > >>>> >> > > >> >> > > Neha
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > On Tue, Sep 8, 2015 at 10:57 AM, Jay Kreps
>> <
>> >> >> > > >>>> jay@confluent.io
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >> > > >> wrote:
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > > Hey Beckett,
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > I was proposing splitting up the KIP just
>> >> for
>> >> >> > > >>>> simplicity of
>> >> >> > > >>>> >> > > >> >> discussion.
>> >> >> > > >>>> >> > > >> >> > > You
>> >> >> > > >>>> >> > > >> >> > > > can still implement them in one patch. I
>> >> think
>> >> >> > > >>>> otherwise it
>> >> >> > > >>>> >> > > will
>> >> >> > > >>>> >> > > >> be
>> >> >> > > >>>> >> > > >> >> > hard
>> >> >> > > >>>> >> > > >> >> > > to
>> >> >> > > >>>> >> > > >> >> > > > discuss/vote on them since if you like
>> the
>> >> >> offset
>> >> >> > > >>>> proposal
>> >> >> > > >>>> >> > but
>> >> >> > > >>>> >> > > not
>> >> >> > > >>>> >> > > >> >> the
>> >> >> > > >>>> >> > > >> >> > > time
>> >> >> > > >>>> >> > > >> >> > > > proposal what do you do?
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > Introducing a second notion of time into
>> >> Kafka
>> >> >> > is a
>> >> >> > > >>>> pretty
>> >> >> > > >>>> >> > > massive
>> >> >> > > >>>> >> > > >> >> > > > philosophical change so it kind of
>> warrants
>> >> >> it's
>> >> >> > > own
>> >> >> > > >>>> KIP I
>> >> >> > > >>>> >> > > think
>> >> >> > > >>>> >> > > >> it
>> >> >> > > >>>> >> > > >> >> > isn't
>> >> >> > > >>>> >> > > >> >> > > > just "Change message format".
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > WRT time I think one thing to clarify in
>> the
>> >> >> > > >>>> proposal is
>> >> >> > > >>>> >> how
>> >> >> > > >>>> >> > MM
>> >> >> > > >>>> >> > > >> will
>> >> >> > > >>>> >> > > >> >> > have
>> >> >> > > >>>> >> > > >> >> > > > access to set the timestamp? Presumably
>> this
>> >> >> will
>> >> >> > > be
>> >> >> > > >>>> a new
>> >> >> > > >>>> >> > > field
>> >> >> > > >>>> >> > > >> in
>> >> >> > > >>>> >> > > >> >> > > > ProducerRecord, right? If so then any
>> user
>> >> can
>> >> >> > set
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > > timestamp,
>> >> >> > > >>>> >> > > >> >> > right?
>> >> >> > > >>>> >> > > >> >> > > > I'm not sure you answered the questions
>> >> around
>> >> >> > how
>> >> >> > > >>>> this
>> >> >> > > >>>> >> will
>> >> >> > > >>>> >> > > work
>> >> >> > > >>>> >> > > >> for
>> >> >> > > >>>> >> > > >> >> > MM
>> >> >> > > >>>> >> > > >> >> > > > since when MM retains timestamps from
>> >> multiple
>> >> >> > > >>>> partitions
>> >> >> > > >>>> >> > they
>> >> >> > > >>>> >> > > >> will
>> >> >> > > >>>> >> > > >> >> > then
>> >> >> > > >>>> >> > > >> >> > > be
>> >> >> > > >>>> >> > > >> >> > > > out of order and in the past (so the
>> >> >> > > >>>> >> > max(lastAppendedTimestamp,
>> >> >> > > >>>> >> > > >> >> > > > currentTimeMillis) override you proposed
>> >> will
>> >> >> not
>> >> >> > > >>>> work,
>> >> >> > > >>>> >> > > right?).
>> >> >> > > >>>> >> > > >> If
>> >> >> > > >>>> >> > > >> >> we
>> >> >> > > >>>> >> > > >> >> > > > don't do this then when you set up
>> mirroring
>> >> >> the
>> >> >> > > >>>> data will
>> >> >> > > >>>> >> > all
>> >> >> > > >>>> >> > > be
>> >> >> > > >>>> >> > > >> new
>> >> >> > > >>>> >> > > >> >> > and
>> >> >> > > >>>> >> > > >> >> > > > you have the same retention problem you
>> >> >> > described.
>> >> >> > > >>>> Maybe I
>> >> >> > > >>>> >> > > missed
>> >> >> > > >>>> >> > > >> >> > > > something...?
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > My main motivation is that given that
>> both
>> >> >> Samza
>> >> >> > > and
>> >> >> > > >>>> Kafka
>> >> >> > > >>>> >> > > streams
>> >> >> > > >>>> >> > > >> >> are
>> >> >> > > >>>> >> > > >> >> > > > doing work that implies a mandatory
>> >> >> > client-defined
>> >> >> > > >>>> notion
>> >> >> > > >>>> >> of
>> >> >> > > >>>> >> > > >> time, I
>> >> >> > > >>>> >> > > >> >> > > really
>> >> >> > > >>>> >> > > >> >> > > > think introducing a different mandatory
>> >> notion
>> >> >> of
>> >> >> > > >>>> time in
>> >> >> > > >>>> >> > > Kafka is
>> >> >> > > >>>> >> > > >> >> > going
>> >> >> > > >>>> >> > > >> >> > > to
>> >> >> > > >>>> >> > > >> >> > > > be quite odd. We should think hard about
>> how
>> >> >> > > >>>> client-defined
>> >> >> > > >>>> >> > > time
>> >> >> > > >>>> >> > > >> >> could
>> >> >> > > >>>> >> > > >> >> > > > work. I'm not sure if it can, but I'm
>> also
>> >> not
>> >> >> > sure
>> >> >> > > >>>> that it
>> >> >> > > >>>> >> > > can't.
>> >> >> > > >>>> >> > > >> >> > Having
>> >> >> > > >>>> >> > > >> >> > > > both will be odd. Did you chat about this
>> >> with
>> >> >> > > >>>> Yi/Kartik on
>> >> >> > > >>>> >> > the
>> >> >> > > >>>> >> > > >> Samza
>> >> >> > > >>>> >> > > >> >> > > side?
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > When you are saying it won't work you are
>> >> >> > assuming
>> >> >> > > >>>> some
>> >> >> > > >>>> >> > > particular
>> >> >> > > >>>> >> > > >> >> > > > implementation? Maybe that the index is a
>> >> >> > > >>>> monotonically
>> >> >> > > >>>> >> > > increasing
>> >> >> > > >>>> >> > > >> >> set
>> >> >> > > >>>> >> > > >> >> > of
>> >> >> > > >>>> >> > > >> >> > > > pointers to the least record with a
>> >> timestamp
>> >> >> > > larger
>> >> >> > > >>>> than
>> >> >> > > >>>> >> the
>> >> >> > > >>>> >> > > >> index
>> >> >> > > >>>> >> > > >> >> > time?
>> >> >> > > >>>> >> > > >> >> > > > In other words a search for time X gives
>> the
>> >> >> > > largest
>> >> >> > > >>>> offset
>> >> >> > > >>>> >> > at
>> >> >> > > >>>> >> > > >> which
>> >> >> > > >>>> >> > > >> >> > all
>> >> >> > > >>>> >> > > >> >> > > > records are <= X?
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > For retention, I agree with the problem
>> you
>> >> >> point
>> >> >> > > >>>> out, but
>> >> >> > > >>>> >> I
>> >> >> > > >>>> >> > > think
>> >> >> > > >>>> >> > > >> >> what
>> >> >> > > >>>> >> > > >> >> > > you
>> >> >> > > >>>> >> > > >> >> > > > are saying in that case is that you want
>> a
>> >> size
>> >> >> > > >>>> limit too.
>> >> >> > > >>>> >> If
>> >> >> > > >>>> >> > > you
>> >> >> > > >>>> >> > > >> use
>> >> >> > > >>>> >> > > >> >> > > > system time you actually hit the same
>> >> problem:
>> >> >> > say
>> >> >> > > >>>> you do a
>> >> >> > > >>>> >> > > full
>> >> >> > > >>>> >> > > >> dump
>> >> >> > > >>>> >> > > >> >> > of
>> >> >> > > >>>> >> > > >> >> > > a
>> >> >> > > >>>> >> > > >> >> > > > DB table with a setting of 7 days
>> retention,
>> >> >> your
>> >> >> > > >>>> retention
>> >> >> > > >>>> >> > > will
>> >> >> > > >>>> >> > > >> >> > actually
>> >> >> > > >>>> >> > > >> >> > > > not get enforced for the first 7 days
>> >> because
>> >> >> the
>> >> >> > > >>>> data is
>> >> >> > > >>>> >> > "new
>> >> >> > > >>>> >> > > to
>> >> >> > > >>>> >> > > >> >> > Kafka".
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > -Jay
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > On Mon, Sep 7, 2015 at 10:44 AM, Jiangjie
>> >> Qin
>> >> >> > > >>>> >> > > >> >> > <jqin@linkedin.com.invalid
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > wrote:
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > > > > Jay,
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > Thanks for the comments. Yes, there are
>> >> >> > actually
>> >> >> > > >>>> three
>> >> >> > > >>>> >> > > >> proposals as
>> >> >> > > >>>> >> > > >> >> > you
>> >> >> > > >>>> >> > > >> >> > > > > pointed out.
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > We will have a separate proposal for
>> (1) -
>> >> >> > > version
>> >> >> > > >>>> >> control
>> >> >> > > >>>> >> > > >> >> mechanism.
>> >> >> > > >>>> >> > > >> >> > > We
>> >> >> > > >>>> >> > > >> >> > > > > actually thought about whether we want
>> to
>> >> >> > > separate
>> >> >> > > >>>> 2 and
>> >> >> > > >>>> >> 3
>> >> >> > > >>>> >> > > >> >> internally
>> >> >> > > >>>> >> > > >> >> > > > > before creating the KIP. The reason we
>> >> put 2
>> >> >> > and
>> >> >> > > 3
>> >> >> > > >>>> >> together
>> >> >> > > >>>> >> > > is
>> >> >> > > >>>> >> > > >> it
>> >> >> > > >>>> >> > > >> >> > will
>> >> >> > > >>>> >> > > >> >> > > > > saves us another cross board wire
>> protocol
>> >> >> > > change.
>> >> >> > > >>>> Like
>> >> >> > > >>>> >> you
>> >> >> > > >>>> >> > > >> said,
>> >> >> > > >>>> >> > > >> >> we
>> >> >> > > >>>> >> > > >> >> > > have
>> >> >> > > >>>> >> > > >> >> > > > > to migrate all the clients in all
>> >> languages.
>> >> >> To
>> >> >> > > >>>> some
>> >> >> > > >>>> >> > extent,
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> >> > effort
>> >> >> > > >>>> >> > > >> >> > > > to
>> >> >> > > >>>> >> > > >> >> > > > > spend on upgrading the clients can be
>> even
>> >> >> > bigger
>> >> >> > > >>>> than
>> >> >> > > >>>> >> > > >> implementing
>> >> >> > > >>>> >> > > >> >> > the
>> >> >> > > >>>> >> > > >> >> > > > new
>> >> >> > > >>>> >> > > >> >> > > > > feature itself. So there are some
>> >> attractions
>> >> >> > if
>> >> >> > > >>>> we can
>> >> >> > > >>>> >> do
>> >> >> > > >>>> >> > 2
>> >> >> > > >>>> >> > > >> and 3
>> >> >> > > >>>> >> > > >> >> > > > together
>> >> >> > > >>>> >> > > >> >> > > > > instead of separately. Maybe after (1)
>> is
>> >> >> done
>> >> >> > it
>> >> >> > > >>>> will be
>> >> >> > > >>>> >> > > >> easier to
>> >> >> > > >>>> >> > > >> >> > do
>> >> >> > > >>>> >> > > >> >> > > > > protocol migration. But if we are able
>> to
>> >> >> come
>> >> >> > to
>> >> >> > > >>>> an
>> >> >> > > >>>> >> > > agreement
>> >> >> > > >>>> >> > > >> on
>> >> >> > > >>>> >> > > >> >> the
>> >> >> > > >>>> >> > > >> >> > > > > timestamp solution, I would prefer to
>> >> have it
>> >> >> > > >>>> together
>> >> >> > > >>>> >> with
>> >> >> > > >>>> >> > > >> >> relative
>> >> >> > > >>>> >> > > >> >> > > > offset
>> >> >> > > >>>> >> > > >> >> > > > > in the interest of avoiding another
>> wire
>> >> >> > protocol
>> >> >> > > >>>> change
>> >> >> > > >>>> >> > (the
>> >> >> > > >>>> >> > > >> >> process
>> >> >> > > >>>> >> > > >> >> > > to
>> >> >> > > >>>> >> > > >> >> > > > > migrate to relative offset is exactly
>> the
>> >> >> same
>> >> >> > as
>> >> >> > > >>>> migrate
>> >> >> > > >>>> >> > to
>> >> >> > > >>>> >> > > >> >> message
>> >> >> > > >>>> >> > > >> >> > > with
>> >> >> > > >>>> >> > > >> >> > > > > timestamp).
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > In terms of timestamp. I completely
>> agree
>> >> >> that
>> >> >> > > >>>> having
>> >> >> > > >>>> >> > client
>> >> >> > > >>>> >> > > >> >> > timestamp
>> >> >> > > >>>> >> > > >> >> > > is
>> >> >> > > >>>> >> > > >> >> > > > > more useful if we can make sure the
>> >> timestamp
>> >> >> > is
>> >> >> > > >>>> good.
>> >> >> > > >>>> >> But
>> >> >> > > >>>> >> > in
>> >> >> > > >>>> >> > > >> >> reality
>> >> >> > > >>>> >> > > >> >> > > > that
>> >> >> > > >>>> >> > > >> >> > > > > can be a really big *IF*. I think the
>> >> problem
>> >> >> > is
>> >> >> > > >>>> exactly
>> >> >> > > >>>> >> as
>> >> >> > > >>>> >> > > Ewen
>> >> >> > > >>>> >> > > >> >> > > > mentioned,
>> >> >> > > >>>> >> > > >> >> > > > > if we let the client to set the
>> >> timestamp, it
>> >> >> > > >>>> would be
>> >> >> > > >>>> >> very
>> >> >> > > >>>> >> > > hard
>> >> >> > > >>>> >> > > >> >> for
>> >> >> > > >>>> >> > > >> >> > > the
>> >> >> > > >>>> >> > > >> >> > > > > broker to utilize it. If broker apply
>> >> >> retention
>> >> >> > > >>>> policy
>> >> >> > > >>>> >> > based
>> >> >> > > >>>> >> > > on
>> >> >> > > >>>> >> > > >> the
>> >> >> > > >>>> >> > > >> >> > > > client
>> >> >> > > >>>> >> > > >> >> > > > > timestamp. One misbehave producer can
>> >> >> > potentially
>> >> >> > > >>>> >> > completely
>> >> >> > > >>>> >> > > >> mess
>> >> >> > > >>>> >> > > >> >> up
>> >> >> > > >>>> >> > > >> >> > > the
>> >> >> > > >>>> >> > > >> >> > > > > retention policy on the broker.
>> Although
>> >> >> people
>> >> >> > > >>>> don't
>> >> >> > > >>>> >> care
>> >> >> > > >>>> >> > > about
>> >> >> > > >>>> >> > > >> >> > server
>> >> >> > > >>>> >> > > >> >> > > > > side timestamp. People do care a lot
>> when
>> >> >> > > timestamp
>> >> >> > > >>>> >> breaks.
>> >> >> > > >>>> >> > > >> >> Searching
>> >> >> > > >>>> >> > > >> >> > > by
>> >> >> > > >>>> >> > > >> >> > > > > timestamp is a really important use
>> case
>> >> even
>> >> >> > > >>>> though it
>> >> >> > > >>>> >> is
>> >> >> > > >>>> >> > > not
>> >> >> > > >>>> >> > > >> used
>> >> >> > > >>>> >> > > >> >> > as
>> >> >> > > >>>> >> > > >> >> > > > > often as searching by offset. It has
>> >> >> > significant
>> >> >> > > >>>> direct
>> >> >> > > >>>> >> > > impact
>> >> >> > > >>>> >> > > >> on
>> >> >> > > >>>> >> > > >> >> RTO
>> >> >> > > >>>> >> > > >> >> > > > when
>> >> >> > > >>>> >> > > >> >> > > > > there is a cross cluster failover as
>> Todd
>> >> >> > > >>>> mentioned.
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > The trick using
>> max(lastAppendedTimestamp,
>> >> >> > > >>>> >> > currentTimeMillis)
>> >> >> > > >>>> >> > > >> is to
>> >> >> > > >>>> >> > > >> >> > > > > guarantee monotonic increase of the
>> >> >> timestamp.
>> >> >> > > Many
>> >> >> > > >>>> >> > > commercial
>> >> >> > > >>>> >> > > >> >> system
>> >> >> > > >>>> >> > > >> >> > > > > actually do something similar to this
>> to
>> >> >> solve
>> >> >> > > the
>> >> >> > > >>>> time
>> >> >> > > >>>> >> > skew.
>> >> >> > > >>>> >> > > >> About
>> >> >> > > >>>> >> > > >> >> > > > > changing the time, I am not sure if
>> people
>> >> >> use
>> >> >> > > NTP
>> >> >> > > >>>> like
>> >> >> > > >>>> >> > > using a
>> >> >> > > >>>> >> > > >> >> watch
>> >> >> > > >>>> >> > > >> >> > > to
>> >> >> > > >>>> >> > > >> >> > > > > just set it forward/backward by an
>> hour or
>> >> >> so.
>> >> >> > > The
>> >> >> > > >>>> time
>> >> >> > > >>>> >> > > >> adjustment
>> >> >> > > >>>> >> > > >> >> I
>> >> >> > > >>>> >> > > >> >> > > used
>> >> >> > > >>>> >> > > >> >> > > > > to do is typically to adjust something
>> >> like a
>> >> >> > > >>>> minute  /
>> >> >> > > >>>> >> > > week. So
>> >> >> > > >>>> >> > > >> >> for
>> >> >> > > >>>> >> > > >> >> > > each
>> >> >> > > >>>> >> > > >> >> > > > > second, there might be a few
>> mircoseconds
>> >> >> > > >>>> slower/faster
>> >> >> > > >>>> >> but
>> >> >> > > >>>> >> > > >> should
>> >> >> > > >>>> >> > > >> >> > not
>> >> >> > > >>>> >> > > >> >> > > > > break the clock completely to make sure
>> >> all
>> >> >> the
>> >> >> > > >>>> >> time-based
>> >> >> > > >>>> >> > > >> >> > transactions
>> >> >> > > >>>> >> > > >> >> > > > are
>> >> >> > > >>>> >> > > >> >> > > > > not affected. The one minute change
>> will
>> >> be
>> >> >> > done
>> >> >> > > >>>> within a
>> >> >> > > >>>> >> > > week
>> >> >> > > >>>> >> > > >> but
>> >> >> > > >>>> >> > > >> >> > not
>> >> >> > > >>>> >> > > >> >> > > > > instantly.
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > Personally, I think having client side
>> >> >> > timestamp
>> >> >> > > >>>> will be
>> >> >> > > >>>> >> > > useful
>> >> >> > > >>>> >> > > >> if
>> >> >> > > >>>> >> > > >> >> we
>> >> >> > > >>>> >> > > >> >> > > > don't
>> >> >> > > >>>> >> > > >> >> > > > > need to put the broker and data
>> integrity
>> >> >> under
>> >> >> > > >>>> risk. If
>> >> >> > > >>>> >> we
>> >> >> > > >>>> >> > > >> have to
>> >> >> > > >>>> >> > > >> >> > > > choose
>> >> >> > > >>>> >> > > >> >> > > > > from one of them but not both. I would
>> >> prefer
>> >> >> > > >>>> server side
>> >> >> > > >>>> >> > > >> timestamp
>> >> >> > > >>>> >> > > >> >> > > > because
>> >> >> > > >>>> >> > > >> >> > > > > for client side timestamp there is
>> always
>> >> a
>> >> >> > plan
>> >> >> > > B
>> >> >> > > >>>> which
>> >> >> > > >>>> >> is
>> >> >> > > >>>> >> > > >> putting
>> >> >> > > >>>> >> > > >> >> > the
>> >> >> > > >>>> >> > > >> >> > > > > timestamp into payload.
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > Another reason I am reluctant to use
>> the
>> >> >> client
>> >> >> > > >>>> side
>> >> >> > > >>>> >> > > timestamp
>> >> >> > > >>>> >> > > >> is
>> >> >> > > >>>> >> > > >> >> > that
>> >> >> > > >>>> >> > > >> >> > > it
>> >> >> > > >>>> >> > > >> >> > > > > is always dangerous to mix the control
>> >> plane
>> >> >> > with
>> >> >> > > >>>> data
>> >> >> > > >>>> >> > > plane. IP
>> >> >> > > >>>> >> > > >> >> did
>> >> >> > > >>>> >> > > >> >> > > this
>> >> >> > > >>>> >> > > >> >> > > > > and it has caused so many different
>> >> breaches
>> >> >> so
>> >> >> > > >>>> people
>> >> >> > > >>>> >> are
>> >> >> > > >>>> >> > > >> >> migrating
>> >> >> > > >>>> >> > > >> >> > to
>> >> >> > > >>>> >> > > >> >> > > > > something like MPLS. An example in
>> Kafka
>> >> is
>> >> >> > that
>> >> >> > > >>>> any
>> >> >> > > >>>> >> client
>> >> >> > > >>>> >> > > can
>> >> >> > > >>>> >> > > >> >> > > > construct a
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>>
>> >> >> LeaderAndIsrRequest/UpdateMetadataRequest/ContorlledShutdownRequest
>> >> >> > > >>>> >> > > >> >> > > (you
>> >> >> > > >>>> >> > > >> >> > > > > name it) and send it to the broker to
>> >> mess up
>> >> >> > the
>> >> >> > > >>>> entire
>> >> >> > > >>>> >> > > >> cluster,
>> >> >> > > >>>> >> > > >> >> > also
>> >> >> > > >>>> >> > > >> >> > > as
>> >> >> > > >>>> >> > > >> >> > > > > we already noticed a busy cluster can
>> >> respond
>> >> >> > > >>>> quite slow
>> >> >> > > >>>> >> to
>> >> >> > > >>>> >> > > >> >> > controller
>> >> >> > > >>>> >> > > >> >> > > > > messages. So it would really be nice
>> if we
>> >> >> can
>> >> >> > > >>>> avoid
>> >> >> > > >>>> >> giving
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> >> power
>> >> >> > > >>>> >> > > >> >> > > to
>> >> >> > > >>>> >> > > >> >> > > > > clients to control the log retention.
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > Thanks,
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > Jiangjie (Becket) Qin
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > On Sun, Sep 6, 2015 at 9:54 PM, Todd
>> >> Palino <
>> >> >> > > >>>> >> > > tpalino@gmail.com>
>> >> >> > > >>>> >> > > >> >> > wrote:
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > So, with regards to why you want to
>> >> search
>> >> >> by
>> >> >> > > >>>> >> timestamp,
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> >> > biggest
>> >> >> > > >>>> >> > > >> >> > > > > > problem I've seen is with consumers
>> who
>> >> >> want
>> >> >> > to
>> >> >> > > >>>> reset
>> >> >> > > >>>> >> > their
>> >> >> > > >>>> >> > > >> >> > > timestamps
>> >> >> > > >>>> >> > > >> >> > > > > to a
>> >> >> > > >>>> >> > > >> >> > > > > > specific point, whether it is to
>> replay
>> >> a
>> >> >> > > certain
>> >> >> > > >>>> >> amount
>> >> >> > > >>>> >> > of
>> >> >> > > >>>> >> > > >> >> > messages,
>> >> >> > > >>>> >> > > >> >> > > > or
>> >> >> > > >>>> >> > > >> >> > > > > to
>> >> >> > > >>>> >> > > >> >> > > > > > rewind to before some problem state
>> >> >> existed.
>> >> >> > > This
>> >> >> > > >>>> >> happens
>> >> >> > > >>>> >> > > more
>> >> >> > > >>>> >> > > >> >> > often
>> >> >> > > >>>> >> > > >> >> > > > than
>> >> >> > > >>>> >> > > >> >> > > > > > anyone would like.
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > To handle this now we need to
>> constantly
>> >> >> > export
>> >> >> > > >>>> the
>> >> >> > > >>>> >> > > broker's
>> >> >> > > >>>> >> > > >> >> offset
>> >> >> > > >>>> >> > > >> >> > > for
>> >> >> > > >>>> >> > > >> >> > > > > > every partition to a time-series
>> >> database
>> >> >> and
>> >> >> > > >>>> then use
>> >> >> > > >>>> >> > > >> external
>> >> >> > > >>>> >> > > >> >> > > > processes
>> >> >> > > >>>> >> > > >> >> > > > > > to query this. I know we're not the
>> only
>> >> >> ones
>> >> >> > > >>>> doing
>> >> >> > > >>>> >> this.
>> >> >> > > >>>> >> > > The
>> >> >> > > >>>> >> > > >> way
>> >> >> > > >>>> >> > > >> >> > the
>> >> >> > > >>>> >> > > >> >> > > > > > broker handles requests for offsets
>> by
>> >> >> > > timestamp
>> >> >> > > >>>> is a
>> >> >> > > >>>> >> > > little
>> >> >> > > >>>> >> > > >> >> obtuse
>> >> >> > > >>>> >> > > >> >> > > > > > (explain it to anyone without
>> intimate
>> >> >> > > knowledge
>> >> >> > > >>>> of the
>> >> >> > > >>>> >> > > >> internal
>> >> >> > > >>>> >> > > >> >> > > > workings
>> >> >> > > >>>> >> > > >> >> > > > > > of the broker - every time I do I see
>> >> >> this).
>> >> >> > In
>> >> >> > > >>>> >> addition,
>> >> >> > > >>>> >> > > as
>> >> >> > > >>>> >> > > >> >> Becket
>> >> >> > > >>>> >> > > >> >> > > > > pointed
>> >> >> > > >>>> >> > > >> >> > > > > > out, it causes problems specifically
>> >> with
>> >> >> > > >>>> retention of
>> >> >> > > >>>> >> > > >> messages
>> >> >> > > >>>> >> > > >> >> by
>> >> >> > > >>>> >> > > >> >> > > time
>> >> >> > > >>>> >> > > >> >> > > > > > when you move partitions around.
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > I'm deliberately avoiding the
>> >> discussion of
>> >> >> > > what
>> >> >> > > >>>> >> > timestamp
>> >> >> > > >>>> >> > > to
>> >> >> > > >>>> >> > > >> >> use.
>> >> >> > > >>>> >> > > >> >> > I
>> >> >> > > >>>> >> > > >> >> > > > can
>> >> >> > > >>>> >> > > >> >> > > > > > see the argument either way, though I
>> >> tend
>> >> >> to
>> >> >> > > >>>> lean
>> >> >> > > >>>> >> > towards
>> >> >> > > >>>> >> > > the
>> >> >> > > >>>> >> > > >> >> idea
>> >> >> > > >>>> >> > > >> >> > > > that
>> >> >> > > >>>> >> > > >> >> > > > > > the broker timestamp is the only
>> viable
>> >> >> > source
>> >> >> > > >>>> of truth
>> >> >> > > >>>> >> > in
>> >> >> > > >>>> >> > > >> this
>> >> >> > > >>>> >> > > >> >> > > > > situation.
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > -Todd
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > On Sun, Sep 6, 2015 at 7:08 PM, Ewen
>> >> >> > > >>>> Cheslack-Postava <
>> >> >> > > >>>> >> > > >> >> > > > ewen@confluent.io
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > wrote:
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > On Sun, Sep 6, 2015 at 4:57 PM, Jay
>> >> >> Kreps <
>> >> >> > > >>>> >> > > jay@confluent.io
>> >> >> > > >>>> >> > > >> >
>> >> >> > > >>>> >> > > >> >> > > wrote:
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > 2. Nobody cares what time it is
>> on
>> >> the
>> >> >> > > >>>> server.
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > This is a good way of summarizing
>> the
>> >> >> > issue I
>> >> >> > > >>>> was
>> >> >> > > >>>> >> > trying
>> >> >> > > >>>> >> > > to
>> >> >> > > >>>> >> > > >> get
>> >> >> > > >>>> >> > > >> >> > at,
>> >> >> > > >>>> >> > > >> >> > > > > from
>> >> >> > > >>>> >> > > >> >> > > > > > an
>> >> >> > > >>>> >> > > >> >> > > > > > > app's perspective. Of the 3 stated
>> >> goals
>> >> >> of
>> >> >> > > >>>> the KIP,
>> >> >> > > >>>> >> #2
>> >> >> > > >>>> >> > > (lot
>> >> >> > > >>>> >> > > >> >> > > > retention)
>> >> >> > > >>>> >> > > >> >> > > > > > is
>> >> >> > > >>>> >> > > >> >> > > > > > > reasonably handled by a server-side
>> >> >> > > timestamp.
>> >> >> > > >>>> I
>> >> >> > > >>>> >> really
>> >> >> > > >>>> >> > > just
>> >> >> > > >>>> >> > > >> >> care
>> >> >> > > >>>> >> > > >> >> > > > that
>> >> >> > > >>>> >> > > >> >> > > > > a
>> >> >> > > >>>> >> > > >> >> > > > > > > message is there long enough that I
>> >> have
>> >> >> a
>> >> >> > > >>>> chance to
>> >> >> > > >>>> >> > > process
>> >> >> > > >>>> >> > > >> >> it.
>> >> >> > > >>>> >> > > >> >> > #3
>> >> >> > > >>>> >> > > >> >> > > > > > > (searching by timestamp) only seems
>> >> >> useful
>> >> >> > if
>> >> >> > > >>>> we can
>> >> >> > > >>>> >> > > >> guarantee
>> >> >> > > >>>> >> > > >> >> > the
>> >> >> > > >>>> >> > > >> >> > > > > > > server-side timestamp is close
>> enough
>> >> to
>> >> >> > the
>> >> >> > > >>>> original
>> >> >> > > >>>> >> > > >> >> client-side
>> >> >> > > >>>> >> > > >> >> > > > > > > timestamp, and any mirror maker
>> step
>> >> >> seems
>> >> >> > to
>> >> >> > > >>>> break
>> >> >> > > >>>> >> > that
>> >> >> > > >>>> >> > > >> (even
>> >> >> > > >>>> >> > > >> >> > > > ignoring
>> >> >> > > >>>> >> > > >> >> > > > > > any
>> >> >> > > >>>> >> > > >> >> > > > > > > issues with broker availability).
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > I'm also wondering whether
>> optimizing
>> >> for
>> >> >> > > >>>> >> > > >> search-by-timestamp
>> >> >> > > >>>> >> > > >> >> on
>> >> >> > > >>>> >> > > >> >> > > the
>> >> >> > > >>>> >> > > >> >> > > > > > broker
>> >> >> > > >>>> >> > > >> >> > > > > > > is really something we want to do
>> >> given
>> >> >> > that
>> >> >> > > >>>> messages
>> >> >> > > >>>> >> > > aren't
>> >> >> > > >>>> >> > > >> >> > really
>> >> >> > > >>>> >> > > >> >> > > > > > > guaranteed to be ordered by
>> >> >> > application-level
>> >> >> > > >>>> >> > timestamps
>> >> >> > > >>>> >> > > on
>> >> >> > > >>>> >> > > >> the
>> >> >> > > >>>> >> > > >> >> > > > broker.
>> >> >> > > >>>> >> > > >> >> > > > > > Is
>> >> >> > > >>>> >> > > >> >> > > > > > > part of the need for this just due
>> to
>> >> the
>> >> >> > > >>>> current
>> >> >> > > >>>> >> > > consumer
>> >> >> > > >>>> >> > > >> APIs
>> >> >> > > >>>> >> > > >> >> > > being
>> >> >> > > >>>> >> > > >> >> > > > > > > difficult to work with? For
>> example,
>> >> >> could
>> >> >> > > you
>> >> >> > > >>>> >> > implement
>> >> >> > > >>>> >> > > >> this
>> >> >> > > >>>> >> > > >> >> > > pretty
>> >> >> > > >>>> >> > > >> >> > > > > > easily
>> >> >> > > >>>> >> > > >> >> > > > > > > client side just the way you would
>> >> >> > > >>>> broker-side? I'd
>> >> >> > > >>>> >> > > imagine
>> >> >> > > >>>> >> > > >> a
>> >> >> > > >>>> >> > > >> >> > > couple
>> >> >> > > >>>> >> > > >> >> > > > of
>> >> >> > > >>>> >> > > >> >> > > > > > > random seeks + reads during very
>> rare
>> >> >> > > >>>> occasions (i.e.
>> >> >> > > >>>> >> > > when
>> >> >> > > >>>> >> > > >> the
>> >> >> > > >>>> >> > > >> >> > app
>> >> >> > > >>>> >> > > >> >> > > > > starts
>> >> >> > > >>>> >> > > >> >> > > > > > > up) wouldn't be a problem
>> >> >> performance-wise.
>> >> >> > > Or
>> >> >> > > >>>> is it
>> >> >> > > >>>> >> > also
>> >> >> > > >>>> >> > > >> that
>> >> >> > > >>>> >> > > >> >> > you
>> >> >> > > >>>> >> > > >> >> > > > need
>> >> >> > > >>>> >> > > >> >> > > > > > the
>> >> >> > > >>>> >> > > >> >> > > > > > > broker to enforce things like
>> >> >> monotonically
>> >> >> > > >>>> >> increasing
>> >> >> > > >>>> >> > > >> >> timestamps
>> >> >> > > >>>> >> > > >> >> > > > since
>> >> >> > > >>>> >> > > >> >> > > > > > you
>> >> >> > > >>>> >> > > >> >> > > > > > > can't do the query properly and
>> >> >> efficiently
>> >> >> > > >>>> without
>> >> >> > > >>>> >> > that
>> >> >> > > >>>> >> > > >> >> > guarantee,
>> >> >> > > >>>> >> > > >> >> > > > and
>> >> >> > > >>>> >> > > >> >> > > > > > > therefore what applications are
>> >> actually
>> >> >> > > >>>> looking for
>> >> >> > > >>>> >> > *is*
>> >> >> > > >>>> >> > > >> >> > > broker-side
>> >> >> > > >>>> >> > > >> >> > > > > > > timestamps?
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > -Ewen
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > Consider cases where data is
>> being
>> >> >> copied
>> >> >> > > >>>> from a
>> >> >> > > >>>> >> > > database
>> >> >> > > >>>> >> > > >> or
>> >> >> > > >>>> >> > > >> >> > from
>> >> >> > > >>>> >> > > >> >> > > > log
>> >> >> > > >>>> >> > > >> >> > > > > > > > files. In steady-state the server
>> >> time
>> >> >> is
>> >> >> > > >>>> very
>> >> >> > > >>>> >> close
>> >> >> > > >>>> >> > to
>> >> >> > > >>>> >> > > >> the
>> >> >> > > >>>> >> > > >> >> > > client
>> >> >> > > >>>> >> > > >> >> > > > > time
>> >> >> > > >>>> >> > > >> >> > > > > > > if
>> >> >> > > >>>> >> > > >> >> > > > > > > > their clocks are sync'd (see 1)
>> but
>> >> >> there
>> >> >> > > >>>> will be
>> >> >> > > >>>> >> > > times of
>> >> >> > > >>>> >> > > >> >> > large
>> >> >> > > >>>> >> > > >> >> > > > > > > divergence
>> >> >> > > >>>> >> > > >> >> > > > > > > > when the copying process is
>> stopped
>> >> or
>> >> >> > > falls
>> >> >> > > >>>> >> behind.
>> >> >> > > >>>> >> > > When
>> >> >> > > >>>> >> > > >> >> this
>> >> >> > > >>>> >> > > >> >> > > > occurs
>> >> >> > > >>>> >> > > >> >> > > > > > it
>> >> >> > > >>>> >> > > >> >> > > > > > > is
>> >> >> > > >>>> >> > > >> >> > > > > > > > clear that the time the data
>> >> arrived on
>> >> >> > the
>> >> >> > > >>>> server
>> >> >> > > >>>> >> is
>> >> >> > > >>>> >> > > >> >> > irrelevant,
>> >> >> > > >>>> >> > > >> >> > > > it
>> >> >> > > >>>> >> > > >> >> > > > > is
>> >> >> > > >>>> >> > > >> >> > > > > > > the
>> >> >> > > >>>> >> > > >> >> > > > > > > > source timestamp that matters.
>> This
>> >> is
>> >> >> > the
>> >> >> > > >>>> problem
>> >> >> > > >>>> >> > you
>> >> >> > > >>>> >> > > are
>> >> >> > > >>>> >> > > >> >> > trying
>> >> >> > > >>>> >> > > >> >> > > > to
>> >> >> > > >>>> >> > > >> >> > > > > > fix
>> >> >> > > >>>> >> > > >> >> > > > > > > by
>> >> >> > > >>>> >> > > >> >> > > > > > > > retaining the mm timestamp but
>> >> really
>> >> >> the
>> >> >> > > >>>> client
>> >> >> > > >>>> >> > should
>> >> >> > > >>>> >> > > >> >> always
>> >> >> > > >>>> >> > > >> >> > > set
>> >> >> > > >>>> >> > > >> >> > > > > the
>> >> >> > > >>>> >> > > >> >> > > > > > > time
>> >> >> > > >>>> >> > > >> >> > > > > > > > with the use of server-side time
>> as
>> >> a
>> >> >> > > >>>> fallback. It
>> >> >> > > >>>> >> > > would
>> >> >> > > >>>> >> > > >> be
>> >> >> > > >>>> >> > > >> >> > worth
>> >> >> > > >>>> >> > > >> >> > > > > > talking
>> >> >> > > >>>> >> > > >> >> > > > > > > > to the Samza folks and reading
>> >> through
>> >> >> > this
>> >> >> > > >>>> blog
>> >> >> > > >>>> >> > post (
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >>
>> >> >> > > >>>>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html
>> >> >> > > >>>> >> > > >> >> > > > > > > > )
>> >> >> > > >>>> >> > > >> >> > > > > > > > on this subject since we went
>> >> through
>> >> >> > > similar
>> >> >> > > >>>> >> > > learnings on
>> >> >> > > >>>> >> > > >> >> the
>> >> >> > > >>>> >> > > >> >> > > > stream
>> >> >> > > >>>> >> > > >> >> > > > > > > > processing side.
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > I think the implication of these
>> >> two is
>> >> >> > > that
>> >> >> > > >>>> we
>> >> >> > > >>>> >> need
>> >> >> > > >>>> >> > a
>> >> >> > > >>>> >> > > >> >> proposal
>> >> >> > > >>>> >> > > >> >> > > > that
>> >> >> > > >>>> >> > > >> >> > > > > > > > handles potentially very
>> >> out-of-order
>> >> >> > > >>>> timestamps in
>> >> >> > > >>>> >> > > some
>> >> >> > > >>>> >> > > >> kind
>> >> >> > > >>>> >> > > >> >> > of
>> >> >> > > >>>> >> > > >> >> > > > > sanish
>> >> >> > > >>>> >> > > >> >> > > > > > > way
>> >> >> > > >>>> >> > > >> >> > > > > > > > (buggy clients will set something
>> >> >> totally
>> >> >> > > >>>> wrong as
>> >> >> > > >>>> >> > the
>> >> >> > > >>>> >> > > >> time).
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > -Jay
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > On Sun, Sep 6, 2015 at 4:22 PM,
>> Jay
>> >> >> > Kreps <
>> >> >> > > >>>> >> > > >> jay@confluent.io>
>> >> >> > > >>>> >> > > >> >> > > > wrote:
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > > The magic byte is used to
>> version
>> >> >> > message
>> >> >> > > >>>> format
>> >> >> > > >>>> >> so
>> >> >> > > >>>> >> > > >> we'll
>> >> >> > > >>>> >> > > >> >> > need
>> >> >> > > >>>> >> > > >> >> > > to
>> >> >> > > >>>> >> > > >> >> > > > > > make
>> >> >> > > >>>> >> > > >> >> > > > > > > > > sure that check is in place--I
>> >> >> actually
>> >> >> > > >>>> don't see
>> >> >> > > >>>> >> > it
>> >> >> > > >>>> >> > > in
>> >> >> > > >>>> >> > > >> the
>> >> >> > > >>>> >> > > >> >> > > > current
>> >> >> > > >>>> >> > > >> >> > > > > > > > > consumer code which I think is
>> a
>> >> bug
>> >> >> we
>> >> >> > > >>>> should
>> >> >> > > >>>> >> fix
>> >> >> > > >>>> >> > > for
>> >> >> > > >>>> >> > > >> the
>> >> >> > > >>>> >> > > >> >> > next
>> >> >> > > >>>> >> > > >> >> > > > > > release
>> >> >> > > >>>> >> > > >> >> > > > > > > > > (filed KAFKA-2523). The
>> purpose of
>> >> >> that
>> >> >> > > >>>> field is
>> >> >> > > >>>> >> so
>> >> >> > > >>>> >> > > >> there
>> >> >> > > >>>> >> > > >> >> is
>> >> >> > > >>>> >> > > >> >> > a
>> >> >> > > >>>> >> > > >> >> > > > > clear
>> >> >> > > >>>> >> > > >> >> > > > > > > > check
>> >> >> > > >>>> >> > > >> >> > > > > > > > > on the format rather than the
>> >> >> scrambled
>> >> >> > > >>>> scenarios
>> >> >> > > >>>> >> > > Becket
>> >> >> > > >>>> >> > > >> >> > > > describes.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > > Also, Becket, I don't think
>> just
>> >> >> fixing
>> >> >> > > >>>> the java
>> >> >> > > >>>> >> > > client
>> >> >> > > >>>> >> > > >> is
>> >> >> > > >>>> >> > > >> >> > > > > sufficient
>> >> >> > > >>>> >> > > >> >> > > > > > > as
>> >> >> > > >>>> >> > > >> >> > > > > > > > > that would break other
>> >> clients--i.e.
>> >> >> if
>> >> >> > > >>>> anyone
>> >> >> > > >>>> >> > > writes a
>> >> >> > > >>>> >> > > >> v1
>> >> >> > > >>>> >> > > >> >> > > > > messages,
>> >> >> > > >>>> >> > > >> >> > > > > > > even
>> >> >> > > >>>> >> > > >> >> > > > > > > > > by accident, any non-v1-capable
>> >> >> > consumer
>> >> >> > > >>>> will
>> >> >> > > >>>> >> > break.
>> >> >> > > >>>> >> > > I
>> >> >> > > >>>> >> > > >> >> think
>> >> >> > > >>>> >> > > >> >> > we
>> >> >> > > >>>> >> > > >> >> > > > > > > probably
>> >> >> > > >>>> >> > > >> >> > > > > > > > > need a way to have the server
>> >> ensure
>> >> >> a
>> >> >> > > >>>> particular
>> >> >> > > >>>> >> > > >> message
>> >> >> > > >>>> >> > > >> >> > > format
>> >> >> > > >>>> >> > > >> >> > > > > > either
>> >> >> > > >>>> >> > > >> >> > > > > > > > at
>> >> >> > > >>>> >> > > >> >> > > > > > > > > read or write time.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > > -Jay
>> >> >> > > >>>> >> > > >> >> > > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > > On Thu, Sep 3, 2015 at 3:47 PM,
>> >> >> > Jiangjie
>> >> >> > > >>>> Qin
>> >> >> > > >>>> >> > > >> >> > > > > > <jqin@linkedin.com.invalid
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > > wrote:
>> >> >> > > >>>> >> > > >> >> > > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> Hi Guozhang,
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> I checked the code again.
>> >> Actually
>> >> >> CRC
>> >> >> > > >>>> check
>> >> >> > > >>>> >> > > probably
>> >> >> > > >>>> >> > > >> >> won't
>> >> >> > > >>>> >> > > >> >> > > > fail.
>> >> >> > > >>>> >> > > >> >> > > > > > The
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> newly
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> added timestamp field might be
>> >> >> treated
>> >> >> > > as
>> >> >> > > >>>> >> > keyLength
>> >> >> > > >>>> >> > > >> >> instead,
>> >> >> > > >>>> >> > > >> >> > > so
>> >> >> > > >>>> >> > > >> >> > > > we
>> >> >> > > >>>> >> > > >> >> > > > > > are
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> likely to receive an
>> >> >> > > >>>> IllegalArgumentException
>> >> >> > > >>>> >> when
>> >> >> > > >>>> >> > > try
>> >> >> > > >>>> >> > > >> to
>> >> >> > > >>>> >> > > >> >> > read
>> >> >> > > >>>> >> > > >> >> > > > the
>> >> >> > > >>>> >> > > >> >> > > > > > > key.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> I'll update the KIP.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> Thanks,
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> Jiangjie (Becket) Qin
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> On Thu, Sep 3, 2015 at 12:48
>> PM,
>> >> >> > > Jiangjie
>> >> >> > > >>>> Qin <
>> >> >> > > >>>> >> > > >> >> > > > jqin@linkedin.com>
>> >> >> > > >>>> >> > > >> >> > > > > > > > wrote:
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Hi, Guozhang,
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Thanks for reading the KIP.
>> By
>> >> >> "old
>> >> >> > > >>>> >> consumer", I
>> >> >> > > >>>> >> > > >> meant
>> >> >> > > >>>> >> > > >> >> the
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > ZookeeperConsumerConnector
>> in
>> >> >> trunk
>> >> >> > > >>>> now, i.e.
>> >> >> > > >>>> >> > > without
>> >> >> > > >>>> >> > > >> >> this
>> >> >> > > >>>> >> > > >> >> > > bug
>> >> >> > > >>>> >> > > >> >> > > > > > > fixed.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> If we
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > fix the
>> >> ZookeeperConsumerConnector
>> >> >> > > then
>> >> >> > > >>>> it
>> >> >> > > >>>> >> will
>> >> >> > > >>>> >> > > throw
>> >> >> > > >>>> >> > > >> >> > > > exception
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> complaining
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > about the unsupported
>> version
>> >> when
>> >> >> > it
>> >> >> > > >>>> sees
>> >> >> > > >>>> >> > message
>> >> >> > > >>>> >> > > >> >> format
>> >> >> > > >>>> >> > > >> >> > > V1.
>> >> >> > > >>>> >> > > >> >> > > > > > What I
>> >> >> > > >>>> >> > > >> >> > > > > > > > was
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > trying to say is that if we
>> >> have
>> >> >> > some
>> >> >> > > >>>> >> > > >> >> > > > ZookeeperConsumerConnector
>> >> >> > > >>>> >> > > >> >> > > > > > > > running
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > without the fix, the
>> consumer
>> >> will
>> >> >> > > >>>> complain
>> >> >> > > >>>> >> > about
>> >> >> > > >>>> >> > > CRC
>> >> >> > > >>>> >> > > >> >> > > mismatch
>> >> >> > > >>>> >> > > >> >> > > > > > > instead
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> of
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > unsupported version.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Thanks,
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > Jiangjie (Becket) Qin
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> > On Thu, Sep 3, 2015 at 12:15
>> >> PM,
>> >> >> > > >>>> Guozhang
>> >> >> > > >>>> >> Wang <
>> >> >> > > >>>> >> > > >> >> > > > > > wangguoz@gmail.com>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> wrote:
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Thanks for the write-up
>> >> Jiangjie.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> One comment about migration
>> >> plan:
>> >> >> > > "For
>> >> >> > > >>>> old
>> >> >> > > >>>> >> > > >> consumers,
>> >> >> > > >>>> >> > > >> >> if
>> >> >> > > >>>> >> > > >> >> > > they
>> >> >> > > >>>> >> > > >> >> > > > > see
>> >> >> > > >>>> >> > > >> >> > > > > > > the
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> new
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> protocol the CRC check will
>> >> >> fail"..
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Do you mean this bug in the
>> >> old
>> >> >> > > >>>> consumer
>> >> >> > > >>>> >> cannot
>> >> >> > > >>>> >> > > be
>> >> >> > > >>>> >> > > >> >> fixed
>> >> >> > > >>>> >> > > >> >> > > in a
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> backward-compatible way?
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> Guozhang
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> On Thu, Sep 3, 2015 at 8:35
>> >> AM,
>> >> >> > > >>>> Jiangjie Qin
>> >> >> > > >>>> >> > > >> >> > > > > > > > <jqin@linkedin.com.invalid
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> wrote:
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Hi,
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > We just created KIP-31 to
>> >> >> > propose a
>> >> >> > > >>>> message
>> >> >> > > >>>> >> > > format
>> >> >> > > >>>> >> > > >> >> > change
>> >> >> > > >>>> >> > > >> >> > > > in
>> >> >> > > >>>> >> > > >> >> > > > > > > Kafka.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >>
>> >> >> > > >>>>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-31+-+Message+format+change+proposal
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > As a summary, the
>> >> motivations
>> >> >> > are:
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > 1. Avoid server side
>> message
>> >> >> > > >>>> re-compression
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > 2. Honor time-based log
>> roll
>> >> >> and
>> >> >> > > >>>> retention
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > 3. Enable offset search
>> by
>> >> >> > > timestamp
>> >> >> > > >>>> at a
>> >> >> > > >>>> >> > finer
>> >> >> > > >>>> >> > > >> >> > > > granularity.
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Feedback and comments are
>> >> >> > welcome!
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Thanks,
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> > Jiangjie (Becket) Qin
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> --
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >> -- Guozhang
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >> >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >>
>> >> >> > > >>>> >> > > >> >> > > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > > > --
>> >> >> > > >>>> >> > > >> >> > > > > > > Thanks,
>> >> >> > > >>>> >> > > >> >> > > > > > > Ewen
>> >> >> > > >>>> >> > > >> >> > > > > > >
>> >> >> > > >>>> >> > > >> >> > > > > >
>> >> >> > > >>>> >> > > >> >> > > > >
>> >> >> > > >>>> >> > > >> >> > > >
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> > > --
>> >> >> > > >>>> >> > > >> >> > > Thanks,
>> >> >> > > >>>> >> > > >> >> > > Neha
>> >> >> > > >>>> >> > > >> >> > >
>> >> >> > > >>>> >> > > >> >> >
>> >> >> > > >>>> >> > > >> >>
>> >> >> > > >>>> >> > > >>
>> >> >> > > >>>> >> > >
>> >> >> > > >>>> >> >
>> >> >> > > >>>> >>
>> >> >> > > >>>> >>
>> >> >> > > >>>> >>
>> >> >> > > >>>> >> --
>> >> >> > > >>>> >> Thanks,
>> >> >> > > >>>> >> Ewen
>> >> >> > > >>>> >>
>> >> >> > > >>>>
>> >> >> > > >>>>
>> >> >> > > >>>
>> >> >> > > >>
>> >> >> > > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>


Mime
View raw message