kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: [DISCUSS] KIP-138: Change punctuate semantics
Date Mon, 24 Apr 2017 17:22:34 GMT
>> Would a dashboard need perfect regularity? Wouldn't an upper bound suffice?

If you go with stream-time and don't have any input records for all
partitions, punctuate would not be called at all, and thus your
dashboard would "freeze".

>> I thought about cron-type things, but aren't they better triggered by an
>> external scheduler (they're more flexible anyway), which then feeds
>> "commands" into the topology?

I guess it depends what kind of periodic action you want to trigger. The
"cron job" was just an analogy. Maybe it's just a heartbeat to some
other service, that signals that your Streams app is still running.


-Matthias


On 4/24/17 10:02 AM, Michal Borowiecki wrote:
> Thanks!
> 
> Would a dashboard need perfect regularity? Wouldn't an upper bound suffice?
> 
> Unless too frequent messages on replay could overpower it?
> 
> 
> I thought about cron-type things, but aren't they better triggered by an
> external scheduler (they're more flexible anyway), which then feeds
> "commands" into the topology?
> 
> Just my 2c.
> 
> Cheers,
> 
> Michal
> 
> 
> On 24/04/17 17:57, Matthias J. Sax wrote:
>> A simple example would be some dashboard app, that needs to get
>> "current" status in regular time intervals (ie, and real-time app).
>>
>> Or something like a "scheduler" -- think "cron job" application.
>>
>>
>> -Matthias
>>
>> On 4/24/17 2:23 AM, Michal Borowiecki wrote:
>>> Hi Matthias,
>>>
>>> I agree it's difficult to reason about the hybrid approach, I certainly
>>> found it hard and I'm totally on board with the mantra.
>>>
>>> I'd be happy to limit the scope of this KIP to add system-time
>>> punctuation semantics (in addition to existing stream-time semantics)
>>> and leave more complex schemes for users to implement on top of that.
>>>
>>> Further additional PunctuationTypes, could then be added by future KIPs,
>>> possibly including the hybrid approach once it has been given more thought.
>>>
>>>> There are real-time applications, that want to get
>>>> callbacks in regular system-time intervals (completely independent from
>>>> stream-time).
>>> Can you please describe what they are, so that I can put them on the
>>> wiki for later reference?
>>>
>>> Thanks,
>>>
>>> Michal
>>>
>>>
>>> On 23/04/17 21:27, Matthias J. Sax wrote:
>>>> Hi,
>>>>
>>>> I do like Damian's API proposal about the punctuation callback function.
>>>>
>>>> I also did reread the KIP and thought about the semantics we want to
>>>> provide.
>>>>
>>>>> Given the above, I don't see a reason any more for a separate system-time based punctuation.
>>>> I disagree here. There are real-time applications, that want to get
>>>> callbacks in regular system-time intervals (completely independent from
>>>> stream-time). Thus we should allow this -- if we really follow the
>>>> "hybrid" approach, this could be configured with stream-time interval
>>>> infinite and delay whatever system-time punctuation interval you want to
>>>> have. However, I would like to add a proper API for this and do this
>>>> configuration under the hood (that would allow one implementation within
>>>> all kind of branching for different cases).
>>>>
>>>> Thus, we definitely should have PunctutionType#StreamTime and
>>>> #SystemTime -- and additionally, we _could_ have #Hybrid. Thus, I am not
>>>> a fan of your latest API proposal.
>>>>
>>>>
>>>> About the hybrid approach in general. On the one hand I like it, on the
>>>> other hand, it seems to be rather (1) complicated (not necessarily from
>>>> an implementation point of view, but for people to understand it) and
>>>> (2) mixes two semantics together in a "weird" way". Thus, I disagree with:
>>>>
>>>>> It may appear complicated at first but I do think these semantics will
>>>>> still be more understandable to users than having 2 separate punctuation
>>>>> schedules/callbacks with different PunctuationTypes.
>>>> This statement only holds if you apply strong assumptions that I don't
>>>> believe hold in general -- see (2) for details -- and I think it is
>>>> harder than you assume to reason about the hybrid approach in general.
>>>> IMHO, the hybrid approach is a "false friend" that seems to be easy to
>>>> reason about...
>>>>
>>>>
>>>> (1) Streams always embraced "easy to use" and we should really be
>>>> careful to keep it this way. On the other hand, as we are talking about
>>>> changes to PAPI, it won't affect DSL users (DSL does not use punctuation
>>>> at all at the moment), and thus, the "easy to use" mantra might not be
>>>> affected, while it will allow advanced users to express more complex stuff.
>>>>
>>>> I like the mantra: "make simple thing easy and complex things possible".
>>>>
>>>> (2) IMHO the major disadvantage (issue?) of the hybrid approach is the
>>>> implicit assumption that even-time progresses at the same "speed" as
>>>> system-time during regular processing. This implies the assumption that
>>>> a slower progress in stream-time indicates the absence of input events
>>>> (and that later arriving input events will have a larger event-time with
>>>> high probability). Even if this might be true for some use cases, I
>>>> doubt it holds in general. Assume that you get a spike in traffic and
>>>> for some reason stream-time does advance slowly because you have more
>>>> records to process. This might trigger a system-time based punctuation
>>>> call even if this seems not to be intended. I strongly believe that it
>>>> is not easy to reason about the semantics of the hybrid approach (even
>>>> if the intentional semantics would be super useful -- but I doubt that
>>>> we get want we ask for).
>>>>
>>>> Thus, I also believe that one might need different "configuration"
>>>> values for the hybrid approach if you run the same code for different
>>>> scenarios: regular processing, re-processing, catching up scenario. And
>>>> as the term "configuration" implies, we might be better off to not mix
>>>> configuration with business logic that is expressed via code.
>>>>
>>>>
>>>> One more comment: I also don't think that the hybrid approach is
>>>> deterministic as claimed in the use-case subpage. I understand the
>>>> reasoning and agree, that it is deterministic if certain assumptions
>>>> hold -- compare above -- and if configured correctly. But strictly
>>>> speaking it's not because there is a dependency on system-time (and
>>>> IMHO, if system-time is involved it cannot be deterministic by definition).
>>>>
>>>>
>>>>> I see how in theory this could be implemented on top of the 2 punctuate
>>>>> callbacks with the 2 different PunctuationTypes (one stream-time based,
>>>>> the other system-time based) but it would be a much more complicated
>>>>> scheme and I don't want to suggest that.
>>>> I agree that expressing the intended hybrid semantics is harder if we
>>>> offer only #StreamTime and #SystemTime punctuation. However, I also
>>>> believe that the hybrid approach is a "false friend" with regard to
>>>> reasoning about the semantics (it indicates that it more easy as it is
>>>> in reality). Therefore, we might be better off to not offer the hybrid
>>>> approach and make it clear to a developed, that it is hard to mix
>>>> #StreamTime and #SystemTime in a semantically sound way.
>>>>
>>>>
>>>> Looking forward to your feedback. :)
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>>
>>>> On 4/22/17 11:43 AM, Michal Borowiecki wrote:
>>>>> Hi all,
>>>>>
>>>>> Looking for feedback on the functional interface approach Damian
>>>>> proposed. What do people think?
>>>>>
>>>>> Further on the semantics of triggering punctuate though:
>>>>>
>>>>> I ran through the 2 use cases that Arun had kindly put on the wiki
>>>>> (https://cwiki.apache.org/confluence/display/KAFKA/Punctuate+Use+Cases)
>>>>> in my head and on a whiteboard and I can't find a better solution than
>>>>> the "hybrid" approach he had proposed.
>>>>>
>>>>> I see how in theory this could be implemented on top of the 2 punctuate
>>>>> callbacks with the 2 different PunctuationTypes (one stream-time based,
>>>>> the other system-time based) but it would be a much more complicated
>>>>> scheme and I don't want to suggest that.
>>>>>
>>>>> However, to add to the hybrid algorithm proposed, I suggest one
>>>>> parameter to that: a tolerance period, expressed in milliseconds
>>>>> system-time, after which the punctuation will be invoked in case the
>>>>> stream-time advance hasn't triggered it within the requested interval
>>>>> since the last invocation of punctuate (as recorded in system-time)
>>>>>
>>>>> This would allow a user-defined tolerance for late arriving events. The
>>>>> trade off would be left for the user to decide: regular punctuation in
>>>>> the case of absence of events vs allowing for records arriving late or
>>>>> some build-up due to processing not catching up with the event rate.
>>>>> In the one extreme, this tolerance could be set to infinity, turning
>>>>> hybrid into simply stream-time based punctuate, like we have now. In the
>>>>> other extreme, the tolerance could be set to 0, resulting in a
>>>>> system-time upper bound on the effective punctuation interval.
>>>>>
>>>>> Given the above, I don't see a reason any more for a separate
>>>>> system-time based punctuation. The "hybrid" approach with 0ms tolerance
>>>>> would under normal operation trigger at regular intervals wrt the
>>>>> system-time, except in cases of re-play/catch-up, where the stream time
>>>>> advances faster than system time. In these cases punctuate would happen
>>>>> more often than the specified interval wrt system time. However, the
>>>>> use-cases that need system-time punctuations (that I've seen at least)
>>>>> really only have a need for an upper bound on punctuation delay but
>>>>> don't need a lower bound.
>>>>>
>>>>> To that effect I'd propose the api to be as follows, on ProcessorContext:
>>>>>
>>>>> schedule(Punctuator callback, long interval, long toleranceIterval); // schedules punctuate at stream-time intervals with a system-time upper bound of (interval+toleranceInterval)
>>>>>
>>>>> schedule(Punctuator callback, long interval); // schedules punctuate at stream-time intervals without an system-time upper bound - this is equivalent to current stream-time based punctuate
>>>>>
>>>>> Punctuation is triggered when either:
>>>>> - the stream time advances past the (stream time of the previous
>>>>> punctuation) + interval;
>>>>> - or (iff the toleranceInterval is set) when the system time advances
>>>>> past the (system time of the previous punctuation) + interval +
>>>>> toleranceInterval
>>>>>
>>>>> In either case:
>>>>> - we trigger punctuate passing as the argument the stream time at which
>>>>> the current punctuation was meant to happen
>>>>> - next punctuate is scheduled at (stream time at which the current
>>>>> punctuation was meant to happen) + interval
>>>>>
>>>>> It may appear complicated at first but I do think these semantics will
>>>>> still be more understandable to users than having 2 separate punctuation
>>>>> schedules/callbacks with different PunctuationTypes.
>>>>>
>>>>>
>>>>>
>>>>> PS. Having re-read this, maybe the following alternative would be easier
>>>>> to understand (WDYT?):
>>>>>
>>>>> schedule(Punctuator callback, long streamTimeInterval, long systemTimeUpperBound); // schedules punctuate at stream-time intervals with a system-time upper bound - systemTimeUpperBound must be no less than streamTimeInterval
>>>>>
>>>>> schedule(Punctuator callback, long streamTimeInterval); // schedules punctuate at stream-time intervals without a system-time upper bound - this is equivalent to current stream-time based punctuate
>>>>>
>>>>> Punctuation is triggered when either:
>>>>> - the stream time advances past the (stream time of the previous
>>>>> punctuation) + streamTimeInterval;
>>>>> - or (iff systemTimeUpperBound is set) when the system time advances
>>>>> past the (system time of the previous punctuation) + systemTimeUpperBound
>>>>>
>>>>> Awaiting comments.
>>>>>
>>>>> Thanks,
>>>>> Michal
>>>>>
>>>>> On 21/04/17 16:56, Michal Borowiecki wrote:
>>>>>> Yes, that's what I meant. Just wanted to highlight we'd deprecate it
>>>>>> in favour of something that doesn't return a record. Not a problem though.
>>>>>>
>>>>>>
>>>>>> On 21/04/17 16:32, Damian Guy wrote:
>>>>>>> Thanks Michal,
>>>>>>> I agree Transformer.punctuate should also be void, but we can deprecate
>>>>>>> that too in favor of the new interface.
>>>>>>>
>>>>>>> Thanks for the javadoc PR!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Damian
>>>>>>>
>>>>>>> On Fri, 21 Apr 2017 at 09:31 Michal Borowiecki <
>>>>>>> michal.borowiecki@openbet.com> wrote:
>>>>>>>
>>>>>>>> Yes, that looks better to me.
>>>>>>>>
>>>>>>>> Note that punctuate on Transformer is currently returning a record, but I
>>>>>>>> think it's ok to have all output records be sent via
>>>>>>>> ProcessorContext.forward, which has to be used anyway if you want to send
>>>>>>>> multiple records from one invocation of punctuate.
>>>>>>>>
>>>>>>>> This way it's consistent between Processor and Transformer.
>>>>>>>>
>>>>>>>>
>>>>>>>> BTW, looking at this I found a glitch in the javadoc and put a comment
>>>>>>>> there:
>>>>>>>>
>>>>>>>> https://github.com/apache/kafka/pull/2413/files#r112634612
>>>>>>>>
>>>>>>>> and PR: https://github.com/apache/kafka/pull/2884
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Michal
>>>>>>>> On 20/04/17 18:55, Damian Guy wrote:
>>>>>>>>
>>>>>>>> Hi Michal,
>>>>>>>>
>>>>>>>> Thanks for the KIP. I'd like to propose a bit more of a radical change to
>>>>>>>> the API.
>>>>>>>> 1. deprecate the punctuate method on Processor
>>>>>>>> 2. create a new Functional Interface just for Punctuation, something like:
>>>>>>>> interface Punctuator {
>>>>>>>>     void punctuate(long timestamp)
>>>>>>>> }
>>>>>>>> 3. add a new schedule function to ProcessorContext: schedule(long
>>>>>>>> interval, PunctuationType type, Punctuator callback)
>>>>>>>> 4. deprecate the existing schedule function
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Damian
>>>>>>>>
>>>>>>>> On Sun, 16 Apr 2017 at 21:55 Michal Borowiecki <
>>>>>>>> michal.borowiecki@openbet.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Thomas,
>>>>>>>>>
>>>>>>>>> I would say our use cases fall in the same category as yours.
>>>>>>>>>
>>>>>>>>> 1) One is expiry of old records, it's virtually identical to yours.
>>>>>>>>>
>>>>>>>>> 2) Second one is somewhat more convoluted but boils down to the same type
>>>>>>>>> of design:
>>>>>>>>>
>>>>>>>>> Incoming messages carry a number of fields, including a timestamp.
>>>>>>>>>
>>>>>>>>> Outgoing messages contain derived fields, one of them (X) is depended on
>>>>>>>>> by the timestamp input field (Y) and some other input field (Z).
>>>>>>>>>
>>>>>>>>> Since the output field X is derived in some non-trivial way, we don't
>>>>>>>>> want to force the logic onto downstream apps. Instead we want to calculate
>>>>>>>>> it in the Kafka Streams app, which means we re-calculate X as soon as the
>>>>>>>>> timestamp in Y is reached (wall clock time) and send a message if it
>>>>>>>>> changed (I say "if" because the derived field (X) is also conditional on
>>>>>>>>> another input field Z).
>>>>>>>>>
>>>>>>>>> So we have kv stores with the records and an additional kv store with
>>>>>>>>> timestamp->id mapping which act like an index where we periodically do a
>>>>>>>>> ranged query.
>>>>>>>>>
>>>>>>>>> Initially we naively tried doing it in punctuate which of course didn't
>>>>>>>>> work when there were no regular msgs on the input topic.
>>>>>>>>> Since this was before 0.10.1 and state stores weren't query-able from
>>>>>>>>> outside we created a "ticker" that produced msgs once per second onto
>>>>>>>>> another topic and fed it into the same topology to trigger punctuate.
>>>>>>>>> This didn't work either, which was much more surprising to us at the
>>>>>>>>> time, because it was not obvious at all that punctuate is only triggered if
>>>>>>>>> *all* input partitions receive messages regularly.
>>>>>>>>> In the end we had to break this into 2 separate Kafka Streams. Main
>>>>>>>>> transformer doesn't use punctuate but sends values of timestamp field Y and
>>>>>>>>> the id to a "scheduler" topic where also the periodic ticks are sent. This
>>>>>>>>> is consumed by the second topology and is its only input topic. There's a
>>>>>>>>> transformer on that topic which populates and updates the time-based
>>>>>>>>> indexes and polls them from punctuate. If the time in the timestamp
>>>>>>>>> elapsed, the record id is sent to the main transformer, which
>>>>>>>>> updates/deletes the record from the main kv store and forwards the
>>>>>>>>> transformed record to the output topic.
>>>>>>>>>
>>>>>>>>> To me this setup feels horrendously complicated for what it does.
>>>>>>>>>
>>>>>>>>> We could incrementally improve on this since 0.10.1 to poll the
>>>>>>>>> timestamp->id "index" stores from some code outside the KafkaStreams
>>>>>>>>> topology so that at least we wouldn't need the extra topic for "ticks".
>>>>>>>>> However, the ticks don't feel so hacky when you realise they give you
>>>>>>>>> some hypothetical benefits in predictability. You can reprocess the
>>>>>>>>> messages in a reproducible manner, since the topologies use event-time,
>>>>>>>>> just that the event time is simply the wall-clock time fed into a topic by
>>>>>>>>> the ticks. (NB in our use case we haven't yet found a need for this kind of
>>>>>>>>> reprocessing).
>>>>>>>>> To make that work though, we would have to have the stream time advance
>>>>>>>>> based on the presence of msgs on the "tick" topic, regardless of the
>>>>>>>>> presence of messages on the other input topic.
>>>>>>>>>
>>>>>>>>> Same as in the expiry use case, both the wall-clock triggered punctuate
>>>>>>>>> and the hybrid would work to simplify this a lot.
>>>>>>>>>
>>>>>>>>> 3) Finally, I have a 3rd use case in the making but I'm still looking if
>>>>>>>>> we can achieve it using session windows instead. I'll keep you posted if we
>>>>>>>>> have to go with punctuate there too.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Michal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/04/17 20:52, Thomas Becker wrote:
>>>>>>>>>
>>>>>>>>> Here's an example that we currently have.  We have a streams processor
>>>>>>>>> that does a transform from one topic into another. One of the fields in
>>>>>>>>> the source topic record is an expiration time, and one of the functions
>>>>>>>>> of the processor is to ensure that expired records get deleted promptly
>>>>>>>>> after that time passes (typically days or weeks after the message was
>>>>>>>>> originally produced). To do that, the processor keeps a state store of
>>>>>>>>> keys and expiration times, iterates that store in punctuate(), and
>>>>>>>>> emits delete (null) records for expired items. This needs to happen at
>>>>>>>>> some minimum interval regardless of the incoming message rate of the
>>>>>>>>> source topic.
>>>>>>>>>
>>>>>>>>> In this scenario, the expiration of records is the primary function of
>>>>>>>>> punctuate, and therefore the key requirement is that the wall-clock
>>>>>>>>> measured time between punctuate calls have some upper-bound. So a pure
>>>>>>>>> wall-clock based schedule would be fine for our needs. But the proposed
>>>>>>>>> "hybrid" system would also be acceptable if that satisfies a broader
>>>>>>>>> range of use-cases.
>>>>>>>>>
>>>>>>>>> On Tue, 2017-04-11 at 14:41 +0200, Michael Noll wrote:
>>>>>>>>>
>>>>>>>>> I apologize for the longer email below.  To my defense, it started
>>>>>>>>> out much
>>>>>>>>> shorter. :-)  Also, to be super-clear, I am intentionally playing
>>>>>>>>> devil's
>>>>>>>>> advocate for a number of arguments brought forth in order to help
>>>>>>>>> improve
>>>>>>>>> this KIP -- I am not implying I necessarily disagree with the
>>>>>>>>> arguments.
>>>>>>>>>
>>>>>>>>> That aside, here are some further thoughts.
>>>>>>>>>
>>>>>>>>> First, there are (at least?) two categories for actions/behavior you
>>>>>>>>> invoke
>>>>>>>>> via punctuate():
>>>>>>>>>
>>>>>>>>> 1. For internal housekeeping of your Processor or Transformer (e.g.,
>>>>>>>>> to
>>>>>>>>> periodically commit to a custom store, to do metrics/logging).  Here,
>>>>>>>>> the
>>>>>>>>> impact of punctuate is typically not observable by other processing
>>>>>>>>> nodes
>>>>>>>>> in the topology.
>>>>>>>>> 2. For controlling the emit frequency of downstream records.  Here,
>>>>>>>>> the
>>>>>>>>> punctuate is all about being observable by downstream processing
>>>>>>>>> nodes.
>>>>>>>>>
>>>>>>>>> A few releases back, we introduced record caches (DSL) and state
>>>>>>>>> store
>>>>>>>>> caches (Processor API) in KIP-63.  Here, we addressed a concern
>>>>>>>>> relating to
>>>>>>>>> (2) where some users needed to control -- here: limit -- the
>>>>>>>>> downstream
>>>>>>>>> output rate of Kafka Streams because the downstream systems/apps
>>>>>>>>> would not
>>>>>>>>> be able to keep up with the upstream output rate (Kafka scalability >
>>>>>>>>> their
>>>>>>>>> scalability).  The argument for KIP-63, which notably did not
>>>>>>>>> introduce a
>>>>>>>>> "trigger" API, was that such an interaction with downstream systems
>>>>>>>>> is an
>>>>>>>>> operational concern;  it should not impact the processing *logic* of
>>>>>>>>> your
>>>>>>>>> application, and thus we didn't want to complicate the Kafka Streams
>>>>>>>>> API,
>>>>>>>>> especially not the declarative DSL, with such operational concerns.
>>>>>>>>>
>>>>>>>>> This KIP's discussion on `punctuate()` takes us back in time (<--
>>>>>>>>> sorry, I
>>>>>>>>> couldn't resist to not make this pun :-P).  As a meta-comment, I am
>>>>>>>>> observing that our conversation is moving more and more into the
>>>>>>>>> direction
>>>>>>>>> of explicit "triggers" because, so far, I have seen only motivations
>>>>>>>>> for
>>>>>>>>> use cases in category (2), but none yet for (1)?  For example, some
>>>>>>>>> comments voiced here are about sth like "IF stream-time didn't
>>>>>>>>> trigger
>>>>>>>>> punctuate, THEN trigger punctuate based on processing-time".  Do we
>>>>>>>>> want
>>>>>>>>> this, and if so, for which use cases and benefits?  Also, on a
>>>>>>>>> related
>>>>>>>>> note, whatever we are discussing here will impact state store caches
>>>>>>>>> (Processor API) and perhaps also impact record caches (DSL), thus we
>>>>>>>>> should
>>>>>>>>> clarify any such impact here.
>>>>>>>>>
>>>>>>>>> Switching topics slightly.
>>>>>>>>>
>>>>>>>>> Jay wrote:
>>>>>>>>>
>>>>>>>>> One thing I've always found super important for this kind of design
>>>>>>>>> work
>>>>>>>>> is to do a really good job of cataloging the landscape of use cases
>>>>>>>>> and
>>>>>>>>> how prevalent each one is.
>>>>>>>>>
>>>>>>>>> +1 to this, as others have already said.
>>>>>>>>>
>>>>>>>>> Here, let me highlight -- just in case -- that when we talked about
>>>>>>>>> windowing use cases in the recent emails, the Processor API (where
>>>>>>>>> `punctuate` resides) does not have any notion of windowing at
>>>>>>>>> all.  If you
>>>>>>>>> want to do windowing *in the Processor API*, you must do so manually
>>>>>>>>> in
>>>>>>>>> combination with window stores.  For this reason I'd suggest to
>>>>>>>>> discuss use
>>>>>>>>> cases not just in general, but also in view of how you'd do so in the
>>>>>>>>> Processor API vs. in the DSL.  Right now, changing/improving
>>>>>>>>> `punctuate`
>>>>>>>>> does not impact the DSL at all, unless we add new functionality to
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> Jay wrote in his strawman example:
>>>>>>>>>
>>>>>>>>> You aggregate click and impression data for a reddit like site.
>>>>>>>>> Every ten
>>>>>>>>> minutes you want to output a ranked list of the top 10 articles
>>>>>>>>> ranked by
>>>>>>>>> clicks/impressions for each geographical area. I want to be able
>>>>>>>>> run this
>>>>>>>>> in steady state as well as rerun to regenerate results (or catch up
>>>>>>>>> if it
>>>>>>>>> crashes).
>>>>>>>>>
>>>>>>>>> This is a good example for more than the obvious reason:  In KIP-63,
>>>>>>>>> we
>>>>>>>>> argued that the reason for saying "every ten minutes" above is not
>>>>>>>>> necessarily about because you want to output data *exactly* after ten
>>>>>>>>> minutes, but that you want to perform an aggregation based on 10-
>>>>>>>>> minute
>>>>>>>>> windows of input data; i.e., the point is about specifying the input
>>>>>>>>> for
>>>>>>>>> your aggregation, not or less about when the results of the
>>>>>>>>> aggregation
>>>>>>>>> should be send downstream.  To take an extreme example, you could
>>>>>>>>> disable
>>>>>>>>> record caches and let your app output a downstream update for every
>>>>>>>>> incoming input record.  If the last input record was from at minute 7
>>>>>>>>> of 10
>>>>>>>>> (for a 10-min window), then what your app would output at minute 10
>>>>>>>>> would
>>>>>>>>> be identical to what it had already emitted at minute 7 earlier
>>>>>>>>> anyways.
>>>>>>>>> This is particularly true when we take late-arriving data into
>>>>>>>>> account:  if
>>>>>>>>> a late record arrived at minute 13, your app would (by default) send
>>>>>>>>> a new
>>>>>>>>> update downstream, even though the "original" 10 minutes have already
>>>>>>>>> passed.
>>>>>>>>>
>>>>>>>>> Jay wrote...:
>>>>>>>>>
>>>>>>>>> There are a couple of tricky things that seem to make this hard
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> either
>>>>>>>>>
>>>>>>>>> of the options proposed:
>>>>>>>>> 1. If I emit this data using event time I have the problem
>>>>>>>>> described where
>>>>>>>>> a geographical region with no new clicks or impressions will fail
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> output
>>>>>>>>>
>>>>>>>>> results.
>>>>>>>>>
>>>>>>>>> ...and Arun Mathew wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We window by the event time, but trigger punctuate in <punctuate
>>>>>>>>> interval>
>>>>>>>>> duration of system time, in the absence of an event crossing the
>>>>>>>>> punctuate
>>>>>>>>> event time.
>>>>>>>>>
>>>>>>>>> So, given what I wrote above about the status quo and what you can
>>>>>>>>> already
>>>>>>>>> do with it, is the concern that the state store cache doesn't give
>>>>>>>>> you
>>>>>>>>> *direct* control over "forcing an output after no later than X
>>>>>>>>> seconds [of
>>>>>>>>> processing-time]" but only indirect control through a cache
>>>>>>>>> size?  (Note
>>>>>>>>> that I am not dismissing the claims why this might be helpful.)
>>>>>>>>>
>>>>>>>>> Arun Mathew wrote:
>>>>>>>>>
>>>>>>>>> We are using Kafka Stream for our Audit Trail, where we need to
>>>>>>>>> output the
>>>>>>>>> event counts on each topic on each cluster aggregated over a 1
>>>>>>>>> minute
>>>>>>>>> window. We have to use event time to be able to cross check the
>>>>>>>>> counts.
>>>>>>>>>
>>>>>>>>> But
>>>>>>>>>
>>>>>>>>> we need to trigger punctuate [aggregate event pushes] by system
>>>>>>>>> time in
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> absence of events. Otherwise the event counts for unexpired windows
>>>>>>>>> would
>>>>>>>>> be 0 which is bad.
>>>>>>>>>
>>>>>>>>> Isn't the latter -- "count would be 0" -- the problem between the
>>>>>>>>> absence
>>>>>>>>> of output vs. an output of 0, similar to the use of `Option[T]` in
>>>>>>>>> Scala
>>>>>>>>> and the difference between `None` and `Some(0)`?  That is, isn't the
>>>>>>>>> root
>>>>>>>>> cause that the downstream system interprets the absence of output in
>>>>>>>>> a
>>>>>>>>> particular way ("No output after 1 minute = I consider the output to
>>>>>>>>> be
>>>>>>>>> 0.")?  Arguably, you could also adapt the downstream system (if
>>>>>>>>> possible)
>>>>>>>>> to correctly handle the difference between absence of output vs.
>>>>>>>>> output of
>>>>>>>>> 0.  I am not implying that we shouldn't care about such a use case,
>>>>>>>>> but
>>>>>>>>> want to understand the motivation better. :-)
>>>>>>>>>
>>>>>>>>> Also, to add some perspective, in some related discussions we talked
>>>>>>>>> about
>>>>>>>>> how a Kafka Streams application should not worry or not be coupled
>>>>>>>>> unnecessarily with such interpretation specifics in a downstream
>>>>>>>>> system's
>>>>>>>>> behavior.  After all, tomorrow your app's output might be consumed by
>>>>>>>>> more
>>>>>>>>> than just this one downstream system.  Arguably, Kafka Connect rather
>>>>>>>>> than
>>>>>>>>> Kafka Streams might be the best tool to link the universes of Kafka
>>>>>>>>> and
>>>>>>>>> downstream systems, including helping to reconcile the differences in
>>>>>>>>> how
>>>>>>>>> these systems interpret changes, updates, late-arriving data,
>>>>>>>>> etc.  Kafka
>>>>>>>>> Connect would allow you to decouple the Kafka Streams app's logical
>>>>>>>>> processing from the specifics of downstream systems, thanks to
>>>>>>>>> specific
>>>>>>>>> sink connectors (e.g. for Elastic, Cassandra, MySQL, S3, etc).  Would
>>>>>>>>> this
>>>>>>>>> decoupling with Kafka Connect help here?  (And if the answer is "Yes,
>>>>>>>>> but
>>>>>>>>> it's currently awkward to use Connect for this", this might be a
>>>>>>>>> problem we
>>>>>>>>> can solve, too.)
>>>>>>>>>
>>>>>>>>> Switching topics slightly again.
>>>>>>>>>
>>>>>>>>> Thomas wrote:
>>>>>>>>>
>>>>>>>>> I'm not entirely convinced that a separate callback (option C)
>>>>>>>>> is that messy (it could just be a default method with an empty
>>>>>>>>> implementation), but if we wanted a single API to handle both
>>>>>>>>> cases,
>>>>>>>>> how about something like the following?
>>>>>>>>>
>>>>>>>>> enum Time {
>>>>>>>>>    STREAM,
>>>>>>>>>    CLOCK
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Yeah, I am on the fence here, too.  If we use the 1-method approach,
>>>>>>>>> then
>>>>>>>>> whatever the user is doing inside this method is a black box to Kafka
>>>>>>>>> Streams (similar to how we have no idea what the user does inside a
>>>>>>>>> `foreach` -- if the function passed to `foreach` writes to external
>>>>>>>>> systems, then Kafka Streams is totally unaware of the fact).  We
>>>>>>>>> won't
>>>>>>>>> know, for example, if the stream-time action has a smaller "trigger"
>>>>>>>>> frequency than the processing-time action.  Or, we won't know whether
>>>>>>>>> the
>>>>>>>>> user custom-codes a "not later than" trigger logic ("Do X every 1-
>>>>>>>>> minute of
>>>>>>>>> stream-time or 1-minute of processing-time, whichever comes
>>>>>>>>> first").  That
>>>>>>>>> said, I am not certain yet whether we would need such knowledge
>>>>>>>>> because,
>>>>>>>>> when using the Processor API, most of the work and decisions must be
>>>>>>>>> done
>>>>>>>>> by the user anyways.  It would matter though if the concept of
>>>>>>>>> "triggers"
>>>>>>>>> were to bubble up into the DSL because in the DSL the management of
>>>>>>>>> windowing, window stores, etc. must be done automatically by Kafka
>>>>>>>>> Streams.
>>>>>>>>>
>>>>>>>>> [In any case, btw, we have the corner case where the user configured
>>>>>>>>> the
>>>>>>>>> stream-time to be processing-time (e.g. via wall-clock timestamp
>>>>>>>>> extractor), at which point both punctuate variants are based on the
>>>>>>>>> same
>>>>>>>>> time semantics / timeline.]
>>>>>>>>>
>>>>>>>>> Again, I apologize for the wall of text.  Congratulations if you made
>>>>>>>>> it
>>>>>>>>> this far. :-)
>>>>>>>>>
>>>>>>>>> More than happy to hear your thoughts!
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>> On Tue, Apr 11, 2017 at 1:12 AM, Arun Mathew <arunmathew88@gmail.com> <arunmathew88@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Matthias.
>>>>>>>>> Sure, will correct it right away.
>>>>>>>>>
>>>>>>>>> On 11-Apr-2017 8:07 AM, "Matthias J. Sax" <matthias@confluent.io> <matthias@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks for preparing this page!
>>>>>>>>>
>>>>>>>>> About terminology:
>>>>>>>>>
>>>>>>>>> You introduce the term "event time" -- but we should call this
>>>>>>>>> "stream
>>>>>>>>> time" -- "stream time" is whatever TimestampExtractor returns and
>>>>>>>>> this
>>>>>>>>> could be event time, ingestion time, or processing/wall-clock time.
>>>>>>>>>
>>>>>>>>> Does this make sense to you?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/10/17 4:58 AM, Arun Mathew wrote:
>>>>>>>>>
>>>>>>>>> Thanks Ewen.
>>>>>>>>>
>>>>>>>>> @Michal, @all, I have created a child page to start the Use Cases
>>>>>>>>>
>>>>>>>>> discussion [https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>> Punctuate+Use+Cases]. Please go through it and give your comments.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> @Tianji, Sorry for the delay. I am trying to make the patch
>>>>>>>>> public.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Arun Mathew
>>>>>>>>>
>>>>>>>>> On 4/8/17, 02:00, "Ewen Cheslack-Postava" <ewen@confluent.io> <ewen@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     Arun,
>>>>>>>>>
>>>>>>>>>     I've given you permission to edit the wiki. Let me know if
>>>>>>>>> you run
>>>>>>>>>
>>>>>>>>> into any
>>>>>>>>>
>>>>>>>>>     issues.
>>>>>>>>>
>>>>>>>>>     -Ewen
>>>>>>>>>
>>>>>>>>>     On Fri, Apr 7, 2017 at 1:21 AM, Arun Mathew <amathew@yahoo-co rp.jp> <amathew@yahoo-corp.jp>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     > Thanks Michal. I don’t have the access yet [arunmathew88].
>>>>>>>>> Should I
>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>     > sending a separate mail for this?
>>>>>>>>>     >
>>>>>>>>>     > I thought one of the person following this thread would be
>>>>>>>>> able to
>>>>>>>>>
>>>>>>>>> give me
>>>>>>>>>
>>>>>>>>>     > access.
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > *From: *Michal Borowiecki <michal.borowiecki@openbet.com> <michal.borowiecki@openbet.com>
>>>>>>>>>     > *Reply-To: *"dev@kafka.apache.org" <dev@kafka.apache.org> <dev@kafka.apache.org> <dev@kafka.apache.org>
>>>>>>>>>     > *Date: *Friday, April 7, 2017 at 17:16
>>>>>>>>>     > *To: *"dev@kafka.apache.org" <dev@kafka.apache.org> <dev@kafka.apache.org> <dev@kafka.apache.org>
>>>>>>>>>     > *Subject: *Re: [DISCUSS] KIP-138: Change punctuate
>>>>>>>>> semantics
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Hi Arun,
>>>>>>>>>     >
>>>>>>>>>     > I was thinking along the same lines as you, listing the use
>>>>>>>>> cases
>>>>>>>>>
>>>>>>>>> on the
>>>>>>>>>
>>>>>>>>>     > wiki, but didn't find time to get around doing that yet.
>>>>>>>>>     > Don't mind if you do it if you have access now.
>>>>>>>>>     > I was thinking it would be nice if, once we have the use
>>>>>>>>> cases
>>>>>>>>>
>>>>>>>>> listed,
>>>>>>>>>
>>>>>>>>>     > people could use likes to up-vote the use cases similar to
>>>>>>>>> what
>>>>>>>>>
>>>>>>>>> they're
>>>>>>>>>
>>>>>>>>>     > working on.
>>>>>>>>>     >
>>>>>>>>>     > I should have a bit more time to action this in the next
>>>>>>>>> few days,
>>>>>>>>>
>>>>>>>>> but
>>>>>>>>>
>>>>>>>>>     > happy for you to do it if you can beat me to it ;-)
>>>>>>>>>     >
>>>>>>>>>     > Cheers,
>>>>>>>>>     > Michal
>>>>>>>>>     >
>>>>>>>>>     > On 07/04/17 04:39, Arun Mathew wrote:
>>>>>>>>>     >
>>>>>>>>>     > Sure, Thanks Matthias. My id is [arunmathew88].
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Of course. I was thinking of a subpage where people can
>>>>>>>>>
>>>>>>>>> collaborate.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Will do as per Michael’s suggestion.
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Regards,
>>>>>>>>>     >
>>>>>>>>>     > Arun Mathew
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > On 4/7/17, 12:30, "Matthias J. Sax" <matthias@confluent.io> <matthias@confluent.io>
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>> matthias@confluent.io> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     Please share your Wiki-ID and a committer can give you
>>>>>>>>> write
>>>>>>>>>
>>>>>>>>> access.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     Btw: as you did not initiate the KIP, you should not
>>>>>>>>> change the
>>>>>>>>>
>>>>>>>>> KIP
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     without the permission of the original author -- in
>>>>>>>>> this case
>>>>>>>>>
>>>>>>>>> Michael.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     So you might also just share your thought over the
>>>>>>>>> mailing list
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     Michael can update the KIP page. Or, as an alternative,
>>>>>>>>> just
>>>>>>>>>
>>>>>>>>> create a
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     subpage for the KIP page.
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     @Michael: WDYT?
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     -Matthias
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     On 4/6/17 8:05 PM, Arun Mathew wrote:
>>>>>>>>>     >
>>>>>>>>>     >     > Hi Jay,
>>>>>>>>>     >
>>>>>>>>>     >     >           Thanks for the advise, I would like to list
>>>>>>>>> down
>>>>>>>>>
>>>>>>>>> the use cases as
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > per your suggestion. But it seems I don't have write
>>>>>>>>>
>>>>>>>>> permission to the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > Apache Kafka Confluent Space. Whom shall I request
>>>>>>>>> for it?
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > Regarding your last question. We are using a patch in
>>>>>>>>> our
>>>>>>>>>
>>>>>>>>> production system
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > which does exactly this.
>>>>>>>>>     >
>>>>>>>>>     >     > We window by the event time, but trigger punctuate in
>>>>>>>>>
>>>>>>>>> <punctuate interval>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > duration of system time, in the absence of an event
>>>>>>>>> crossing
>>>>>>>>>
>>>>>>>>> the punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > event time.
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > We are using Kafka Stream for our Audit Trail, where
>>>>>>>>> we need
>>>>>>>>>
>>>>>>>>> to output the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > event counts on each topic on each cluster aggregated
>>>>>>>>> over a
>>>>>>>>>
>>>>>>>>> 1 minute
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > window. We have to use event time to be able to cross
>>>>>>>>> check
>>>>>>>>>
>>>>>>>>> the counts. But
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > we need to trigger punctuate [aggregate event pushes]
>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> system time in the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > absence of events. Otherwise the event counts for
>>>>>>>>> unexpired
>>>>>>>>>
>>>>>>>>> windows would
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > be 0 which is bad.
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > "Maybe a hybrid solution works: I window by event
>>>>>>>>> time but
>>>>>>>>>
>>>>>>>>> trigger results
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > by system time for windows that have updated? Not
>>>>>>>>> really sure
>>>>>>>>>
>>>>>>>>> the details
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > of making that work. Does that work? Are there
>>>>>>>>> concrete
>>>>>>>>>
>>>>>>>>> examples where you
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > actually want the current behavior?"
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > --
>>>>>>>>>     >
>>>>>>>>>     >     > With Regards,
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > Arun Mathew
>>>>>>>>>     >
>>>>>>>>>     >     > Yahoo! JAPAN Corporation
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > On Wed, Apr 5, 2017 at 8:48 PM, Tianji Li <
>>>>>>>>>
>>>>>>>>> skyahead@gmail.com><skyahead@gmail.com> <skyahead@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     >> Hi Jay,
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >> The hybrid solution is exactly what I expect and
>>>>>>>>> need for
>>>>>>>>>
>>>>>>>>> our use cases
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> when dealing with telecom data.
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >> Thanks
>>>>>>>>>     >
>>>>>>>>>     >     >> Tianji
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >> On Wed, Apr 5, 2017 at 12:01 AM, Jay Kreps <
>>>>>>>>>
>>>>>>>>> jay@confluent.io><jay@confluent.io> <jay@confluent.io> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >>> Hey guys,
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> One thing I've always found super important for
>>>>>>>>> this kind
>>>>>>>>>
>>>>>>>>> of design work
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> is
>>>>>>>>>     >
>>>>>>>>>     >     >>> to do a really good job of cataloging the landscape
>>>>>>>>> of use
>>>>>>>>>
>>>>>>>>> cases and how
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> prevalent each one is. By that I mean not just
>>>>>>>>> listing lots
>>>>>>>>>
>>>>>>>>> of uses, but
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> also grouping them into categories that
>>>>>>>>> functionally need
>>>>>>>>>
>>>>>>>>> the same thing.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> In the absence of this it is very hard to reason
>>>>>>>>> about
>>>>>>>>>
>>>>>>>>> design proposals.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> From the proposals so far I think we have a lot of
>>>>>>>>>
>>>>>>>>> discussion around
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> possible apis, but less around what the user needs
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>> different use
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> cases
>>>>>>>>>     >
>>>>>>>>>     >     >>> and how they would implement that using the api.
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> Here is an example:
>>>>>>>>>     >
>>>>>>>>>     >     >>> You aggregate click and impression data for a
>>>>>>>>> reddit like
>>>>>>>>>
>>>>>>>>> site. Every ten
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> minutes you want to output a ranked list of the top
>>>>>>>>> 10
>>>>>>>>>
>>>>>>>>> articles ranked by
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> clicks/impressions for each geographical area. I
>>>>>>>>> want to be
>>>>>>>>>
>>>>>>>>> able run this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> in steady state as well as rerun to regenerate
>>>>>>>>> results (or
>>>>>>>>>
>>>>>>>>> catch up if it
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> crashes).
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> There are a couple of tricky things that seem to
>>>>>>>>> make this
>>>>>>>>>
>>>>>>>>> hard with
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> either
>>>>>>>>>     >
>>>>>>>>>     >     >>> of the options proposed:
>>>>>>>>>     >
>>>>>>>>>     >     >>> 1. If I emit this data using event time I have the
>>>>>>>>> problem
>>>>>>>>>
>>>>>>>>> described
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> where
>>>>>>>>>     >
>>>>>>>>>     >     >>> a geographical region with no new clicks or
>>>>>>>>> impressions
>>>>>>>>>
>>>>>>>>> will fail to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> output
>>>>>>>>>     >
>>>>>>>>>     >     >>> results.
>>>>>>>>>     >
>>>>>>>>>     >     >>> 2. If I emit this data using system time I have the
>>>>>>>>> problem
>>>>>>>>>
>>>>>>>>> that when
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> reprocessing data my window may not be ten minutes
>>>>>>>>> but 10
>>>>>>>>>
>>>>>>>>> hours if my
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> processing is very fast so it dramatically changes
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> output.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> Maybe a hybrid solution works: I window by event
>>>>>>>>> time but
>>>>>>>>>
>>>>>>>>> trigger results
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> by system time for windows that have updated? Not
>>>>>>>>> really
>>>>>>>>>
>>>>>>>>> sure the details
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> of making that work. Does that work? Are there
>>>>>>>>> concrete
>>>>>>>>>
>>>>>>>>> examples where
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> you
>>>>>>>>>     >
>>>>>>>>>     >     >>> actually want the current behavior?
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> -Jay
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> On Tue, Apr 4, 2017 at 8:32 PM, Arun Mathew <
>>>>>>>>>
>>>>>>>>> arunmathew88@gmail.com> <arunmathew88@gmail.com> <arunmathew88@gmail.com>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Hi All,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Thanks for the KIP. We were also in need of a
>>>>>>>>> mechanism to
>>>>>>>>>
>>>>>>>>> trigger
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> punctuate in the absence of events.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> As I described in [
>>>>>>>>>     >
>>>>>>>>>     >     >>>> https://issues.apache.org/jira/browse/KAFKA-3514?
>>>>>>>>>     >
>>>>>>>>>     >     >>>> focusedCommentId=15926036&page=com.atlassian.jira.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> plugin.system.issuetabpanels:comment-
>>>>>>>>> tabpanel#comment-
>>>>>>>>>
>>>>>>>>> 15926036
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> ],
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - Our approached involved using the event time
>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> default.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - The method to check if there is any punctuate
>>>>>>>>> ready
>>>>>>>>>
>>>>>>>>> in the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    PunctuationQueue is triggered via the any event
>>>>>>>>>
>>>>>>>>> received by the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> stream
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    tread, or at the polling intervals in the
>>>>>>>>> absence of
>>>>>>>>>
>>>>>>>>> any events.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - When we create Punctuate objects (which
>>>>>>>>> contains the
>>>>>>>>>
>>>>>>>>> next event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    for punctuation and interval), we also record
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> creation time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> (system
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    time).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - While checking for maturity of Punctuate
>>>>>>>>> Schedule by
>>>>>>>>>     >
>>>>>>>>>     >     >> mayBePunctuate
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    method, we also check if the system clock has
>>>>>>>>> elapsed
>>>>>>>>>
>>>>>>>>> the punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    interval since the schedule creation time.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - In the absence of any event, or in the
>>>>>>>>> absence of any
>>>>>>>>>
>>>>>>>>> event for
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> one
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    topic in the partition group assigned to the
>>>>>>>>> stream
>>>>>>>>>
>>>>>>>>> task, the system
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    will elapse the interval and we trigger a
>>>>>>>>> punctuate
>>>>>>>>>
>>>>>>>>> using the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> expected
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    punctuation event time.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - we then create the next punctuation schedule
>>>>>>>>> as
>>>>>>>>>
>>>>>>>>> punctuation event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    + punctuation interval, [again recording the
>>>>>>>>> system
>>>>>>>>>
>>>>>>>>> time of creation
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> of
>>>>>>>>>     >
>>>>>>>>>     >     >>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    schedule].
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> We call this a Hybrid Punctuate. Of course, this
>>>>>>>>> approach
>>>>>>>>>
>>>>>>>>> has pros and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> cons.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Pros
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - Punctuates will happen in <punctuate
>>>>>>>>> interval> time
>>>>>>>>>
>>>>>>>>> duration at
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> max
>>>>>>>>>     >
>>>>>>>>>     >     >>> in
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    terms of system time.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - The semantics as a whole continues to revolve
>>>>>>>>> around
>>>>>>>>>
>>>>>>>>> event time.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - We can use the old data [old timestamps] to
>>>>>>>>> rerun any
>>>>>>>>>
>>>>>>>>> experiments
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> or
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    tests.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Cons
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - In case the  <punctuate interval> is not a
>>>>>>>>> time
>>>>>>>>>
>>>>>>>>> duration [say
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> logical
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    time/event count], then the approach might not
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>> meaningful.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - In case there is a case where we have to wait
>>>>>>>>> for an
>>>>>>>>>
>>>>>>>>> actual event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> from
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    a low event rate partition in the partition
>>>>>>>>> group, this
>>>>>>>>>
>>>>>>>>> approach
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> will
>>>>>>>>>     >
>>>>>>>>>     >     >>>> jump
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    the gun.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - in case the event processing cannot catch up
>>>>>>>>> with the
>>>>>>>>>
>>>>>>>>> event rate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> and
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    the expected timestamp events gets queued for
>>>>>>>>> long
>>>>>>>>>
>>>>>>>>> time, this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> approach
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    might jump the gun.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I believe the above approach and discussion goes
>>>>>>>>> close to
>>>>>>>>>
>>>>>>>>> the approach
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> A.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> -----------
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I like the idea of having an even count based
>>>>>>>>> punctuate.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> -----------
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I agree with the discussion around approach C,
>>>>>>>>> that we
>>>>>>>>>
>>>>>>>>> should provide
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>> user with the option to choose system time or
>>>>>>>>> event time
>>>>>>>>>
>>>>>>>>> based
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> punctuates.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> But I believe that the user predominantly wants to
>>>>>>>>> use
>>>>>>>>>
>>>>>>>>> event time while
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> not
>>>>>>>>>     >
>>>>>>>>>     >     >>>> missing out on regular punctuates due to event
>>>>>>>>> delays or
>>>>>>>>>
>>>>>>>>> event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> absences.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Hence a complex punctuate option as Matthias
>>>>>>>>> mentioned
>>>>>>>>>
>>>>>>>>> (quoted below)
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> would
>>>>>>>>>     >
>>>>>>>>>     >     >>>> be most apt.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> "- We might want to add "complex" schedules later
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>> (like, punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> on
>>>>>>>>>     >
>>>>>>>>>     >     >>>> every 10 seconds event-time or 60 seconds system-
>>>>>>>>> time
>>>>>>>>>
>>>>>>>>> whatever comes
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> first)."
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> -----------
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I think I read somewhere that Kafka Streams
>>>>>>>>> started with
>>>>>>>>>
>>>>>>>>> System Time as
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>> punctuation standard, but was later changed to
>>>>>>>>> Event Time.
>>>>>>>>>
>>>>>>>>> I guess
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> there
>>>>>>>>>     >
>>>>>>>>>     >     >>>> would be some good reason behind it. As Kafka
>>>>>>>>> Streams want
>>>>>>>>>
>>>>>>>>> to evolve
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> more
>>>>>>>>>     >
>>>>>>>>>     >     >>>> on the Stream Processing front, I believe the
>>>>>>>>> emphasis on
>>>>>>>>>
>>>>>>>>> event time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> would
>>>>>>>>>     >
>>>>>>>>>     >     >>>> remain quite strong.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> With Regards,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Arun Mathew
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Yahoo! JAPAN Corporation, Tokyo
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> On Wed, Apr 5, 2017 at 3:53 AM, Thomas Becker <
>>>>>>>>>
>>>>>>>>> tobecker@tivo.com> <tobecker@tivo.com> <tobecker@tivo.com>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> Yeah I like PuncutationType much better; I just
>>>>>>>>> threw
>>>>>>>>>
>>>>>>>>> Time out there
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> more as a strawman than an actual suggestion ;) I
>>>>>>>>> still
>>>>>>>>>
>>>>>>>>> think it's
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> worth considering what this buys us over an
>>>>>>>>> additional
>>>>>>>>>
>>>>>>>>> callback. I
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> foresee a number of punctuate implementations
>>>>>>>>> following
>>>>>>>>>
>>>>>>>>> this pattern:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> public void punctuate(PunctuationType type) {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     switch (type) {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>         case EVENT_TIME:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             methodA();
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             break;
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>         case SYSTEM_TIME:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             methodB();
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             break;
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> I guess one advantage of this approach is we
>>>>>>>>> could add
>>>>>>>>>
>>>>>>>>> additional
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> punctuation types later in a backwards compatible
>>>>>>>>> way
>>>>>>>>>
>>>>>>>>> (like event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> count
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> as you mentioned).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> -Tommy
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> On Tue, 2017-04-04 at 11:10 -0700, Matthias J.
>>>>>>>>> Sax wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> That sounds promising.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> I am just wondering if `Time` is the best name.
>>>>>>>>> Maybe we
>>>>>>>>>
>>>>>>>>> want to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> add
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> other non-time based punctuations at some point
>>>>>>>>> later. I
>>>>>>>>>
>>>>>>>>> would
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> suggest
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> enum PunctuationType {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>   EVENT_TIME,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>   SYSTEM_TIME,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> or similar. Just to keep the door open -- it's
>>>>>>>>> easier to
>>>>>>>>>
>>>>>>>>> add new
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> stuff
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> if the name is more generic.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> -Matthias
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> On 4/4/17 5:30 AM, Thomas Becker wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> I agree that the framework providing and
>>>>>>>>> managing the
>>>>>>>>>
>>>>>>>>> notion of
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> stream
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> time is valuable and not something we would
>>>>>>>>> want to
>>>>>>>>>
>>>>>>>>> delegate to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> tasks. I'm not entirely convinced that a
>>>>>>>>> separate
>>>>>>>>>
>>>>>>>>> callback
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> (option
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> C)
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> is that messy (it could just be a default
>>>>>>>>> method with
>>>>>>>>>
>>>>>>>>> an empty
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> implementation), but if we wanted a single API
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> handle both
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> cases,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> how about something like the following?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> enum Time {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>    STREAM,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>    CLOCK
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> Then on ProcessorContext:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> context.schedule(Time time, long interval)  //
>>>>>>>>> We could
>>>>>>>>>
>>>>>>>>> allow
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> this
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> be called once for each value of time to mix
>>>>>>>>>
>>>>>>>>> approaches.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> Then the Processor API becomes:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> punctuate(Time time) // time here denotes which
>>>>>>>>>
>>>>>>>>> schedule resulted
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> in
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> this call.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> Thoughts?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> On Mon, 2017-04-03 at 22:44 -0700, Matthias J.
>>>>>>>>> Sax
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Thanks a lot for the KIP Michal,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> I was thinking about the four options you
>>>>>>>>> proposed in
>>>>>>>>>
>>>>>>>>> more
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> details
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> and
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> this are my thoughts:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (A) You argue, that users can still
>>>>>>>>> "punctuate" on
>>>>>>>>>
>>>>>>>>> event-time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> via
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> process(), but I am not sure if this is
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> Note, that
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> users
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> only
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> get record timestamps via context.timestamp().
>>>>>>>>> Thus,
>>>>>>>>>
>>>>>>>>> users
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> would
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> need
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> track the time progress per partition (based
>>>>>>>>> on the
>>>>>>>>>
>>>>>>>>> partitions
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> they
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> obverse via context.partition(). (This alone
>>>>>>>>> puts a
>>>>>>>>>
>>>>>>>>> huge burden
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> on
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> user by itself.) However, users are not
>>>>>>>>> notified at
>>>>>>>>>
>>>>>>>>> startup
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> what
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> partitions are assigned, and user are not
>>>>>>>>> notified
>>>>>>>>>
>>>>>>>>> when
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> partitions
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> get
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> revoked. Because this information is not
>>>>>>>>> available,
>>>>>>>>>
>>>>>>>>> it's not
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> possible
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> "manually advance" stream-time, and thus
>>>>>>>>> event-time
>>>>>>>>>
>>>>>>>>> punctuation
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> within
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> process() seems not to be possible -- or do
>>>>>>>>> you see a
>>>>>>>>>
>>>>>>>>> way to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> get
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> it
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> done? And even if, it might still be too
>>>>>>>>> clumsy to
>>>>>>>>>
>>>>>>>>> use.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (B) This does not allow to mix both
>>>>>>>>> approaches, thus
>>>>>>>>>
>>>>>>>>> limiting
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> what
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> users
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> can do.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (C) This should give all flexibility we need.
>>>>>>>>> However,
>>>>>>>>>
>>>>>>>>> just
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> adding
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> one
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> more method seems to be a solution that is too
>>>>>>>>> simple
>>>>>>>>>
>>>>>>>>> (cf my
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> comments
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> below).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (D) This might be hard to use. Also, I am not
>>>>>>>>> sure how
>>>>>>>>>
>>>>>>>>> a user
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> could
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> enable system-time and event-time punctuation
>>>>>>>>> in
>>>>>>>>>
>>>>>>>>> parallel.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Overall options (C) seems to be the most
>>>>>>>>> promising
>>>>>>>>>
>>>>>>>>> approach to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> me.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Because I also favor a clean API, we might
>>>>>>>>> keep
>>>>>>>>>
>>>>>>>>> current
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuate()
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> as-is, but deprecate it -- so we can remove it
>>>>>>>>> at some
>>>>>>>>>
>>>>>>>>> later
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> point
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> when
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> people use the "new punctuate API".
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Couple of follow up questions:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - I am wondering, if we should have two
>>>>>>>>> callback
>>>>>>>>>
>>>>>>>>> methods or
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> just
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> one
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (ie, a unified for system and event time
>>>>>>>>> punctuation
>>>>>>>>>
>>>>>>>>> or one for
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> each?).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - If we have one, how can the user figure out,
>>>>>>>>> which
>>>>>>>>>
>>>>>>>>> condition
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> did
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> trigger?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - How would the API look like, for registering
>>>>>>>>>
>>>>>>>>> different
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuate
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> schedules? The "type" must be somehow defined?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - We might want to add "complex" schedules
>>>>>>>>> later on
>>>>>>>>>
>>>>>>>>> (like,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuate
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> on
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> every 10 seconds event-time or 60 seconds
>>>>>>>>> system-time
>>>>>>>>>
>>>>>>>>> whatever
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> comes
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> first). I don't say we should add this right
>>>>>>>>> away, but
>>>>>>>>>
>>>>>>>>> we might
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> want
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> define the API in a way, that it allows
>>>>>>>>> extensions
>>>>>>>>>
>>>>>>>>> like this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> later
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> on,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> without redesigning the API (ie, the API
>>>>>>>>> should be
>>>>>>>>>
>>>>>>>>> designed
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> extensible)
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - Did you ever consider count-based
>>>>>>>>> punctuation?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> I understand, that you would like to solve a
>>>>>>>>> simple
>>>>>>>>>
>>>>>>>>> problem,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> but
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> we
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> learned from the past, that just "adding some
>>>>>>>>> API"
>>>>>>>>>
>>>>>>>>> quickly
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> leads
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to a
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> not very well defined API that needs time
>>>>>>>>> consuming
>>>>>>>>>
>>>>>>>>> clean up
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> later on
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> via other KIPs. Thus, I would prefer to get a
>>>>>>>>> holistic
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuation
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> KIP
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> with this from the beginning on to avoid later
>>>>>>>>> painful
>>>>>>>>>     >
>>>>>>>>>     >     >> redesign.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> -Matthias
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> On 4/3/17 11:58 AM, Michal Borowiecki wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Thanks Thomas,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> I'm also wary of changing the existing
>>>>>>>>> semantics of
>>>>>>>>>     >
>>>>>>>>>     >     >> punctuate,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> for
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> backward compatibility reasons, although I
>>>>>>>>> like the
>>>>>>>>>     >
>>>>>>>>>     >     >> conceptual
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> simplicity of that option.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Adding a new method to me feels safer but, in
>>>>>>>>> a way,
>>>>>>>>>
>>>>>>>>> uglier.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> I
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> added
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> this to the KIP now as option (C).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> The TimestampExtractor mechanism is actually
>>>>>>>>> more
>>>>>>>>>
>>>>>>>>> flexible,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> as
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> it
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> allows
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> you to return any value, you're not limited
>>>>>>>>> to event
>>>>>>>>>
>>>>>>>>> time or
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> system
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> (although I don't see an actual use case
>>>>>>>>> where you
>>>>>>>>>
>>>>>>>>> might need
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> anything
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> else then those two). Hence I also proposed
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> option to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> allow
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> users
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> to, effectively, decide what "stream time" is
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>> them given
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> presence or absence of messages, much like
>>>>>>>>> they can
>>>>>>>>>
>>>>>>>>> decide
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> what
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> msg
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> means for them using the TimestampExtractor.
>>>>>>>>> What do
>>>>>>>>>
>>>>>>>>> you
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> think
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> about
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> that? This is probably most flexible but also
>>>>>>>>> most
>>>>>>>>>     >
>>>>>>>>>     >     >> complicated.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> All comments appreciated.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Cheers,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Michal
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> On 03/04/17 19:23, Thomas Becker wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> Although I fully agree we need a way to
>>>>>>>>> trigger
>>>>>>>>>
>>>>>>>>> periodic
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> processing
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> that is independent from whether and when
>>>>>>>>> messages
>>>>>>>>>
>>>>>>>>> arrive,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> I'm
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> not sure
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> I like the idea of changing the existing
>>>>>>>>> semantics
>>>>>>>>>
>>>>>>>>> across
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> board.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> What if we added an additional callback to
>>>>>>>>> Processor
>>>>>>>>>
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> can
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> be
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> scheduled similarly to punctuate() but was
>>>>>>>>> always
>>>>>>>>>
>>>>>>>>> called at
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> fixed, wall
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> clock based intervals? This way you wouldn't
>>>>>>>>> have to
>>>>>>>>>
>>>>>>>>> give
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> up
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> notion
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> of stream time to be able to do periodic
>>>>>>>>> processing.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> On Mon, 2017-04-03 at 10:34 +0100, Michal
>>>>>>>>> Borowiecki
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Hi all,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> I have created a draft for KIP-138: Change
>>>>>>>>>
>>>>>>>>> punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> semantics
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> <https://cwiki.apache.org/
>>>>>>>>>
>>>>>>>>> confluence/display/KAFKA/KIP- <https://cwiki.apache.org/ confluence/display/KAFKA/KIP-> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > <https://cwiki.apache.org/confluence/display/KAFKA/KI P-> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>>
>>>>>>>>>
>>>>>>>>> 138%
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> 3A+C
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> hange+
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> punctuate+semantics>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> .
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Appreciating there can be different views
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>> system-time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> vs
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> event-
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> semantics for punctuation depending on use-
>>>>>>>>> case and
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> importance of
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> backwards compatibility of any such change,
>>>>>>>>> I've
>>>>>>>>>
>>>>>>>>> left it
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> quite
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> open
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> and
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> hope to fill in more info as the discussion
>>>>>>>>>
>>>>>>>>> progresses.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Thanks,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Michal
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> --
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     Tommy Becker
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     Senior Software Engineer
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     O +1 919.460.4747 <%28919%29%20460-4747> <(919)%20460-4747>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     tivo.com
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> ________________________________
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> This email and any attachments may contain
>>>>>>>>> confidential
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> privileged material for the sole use of the
>>>>>>>>> intended
>>>>>>>>>
>>>>>>>>> recipient.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> Any
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> review, copying, or distribution of this email
>>>>>>>>> (or any
>>>>>>>>>     >
>>>>>>>>>     >     >> attachments)
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> by others is prohibited. If you are not the
>>>>>>>>> intended
>>>>>>>>>
>>>>>>>>> recipient,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> please contact the sender immediately and
>>>>>>>>> permanently
>>>>>>>>>
>>>>>>>>> delete this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> email and any attachments. No employee or agent
>>>>>>>>> of TiVo
>>>>>>>>>
>>>>>>>>> Inc. is
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> authorized to conclude any binding agreement on
>>>>>>>>> behalf
>>>>>>>>>
>>>>>>>>> of TiVo
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> Inc.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> by email. Binding agreements with TiVo Inc. may
>>>>>>>>> only be
>>>>>>>>>
>>>>>>>>> made by a
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> signed written agreement.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> --
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     Tommy Becker
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     Senior Software Engineer
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     O +1 919.460.4747 <%28919%29%20460-4747> <(919)%20460-4747>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     tivo.com
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> ________________________________
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> This email and any attachments may contain
>>>>>>>>> confidential
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> privileged
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> material for the sole use of the intended
>>>>>>>>> recipient. Any
>>>>>>>>>
>>>>>>>>> review,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> copying,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> or distribution of this email (or any
>>>>>>>>> attachments) by
>>>>>>>>>
>>>>>>>>> others is
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> prohibited.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> If you are not the intended recipient, please
>>>>>>>>> contact the
>>>>>>>>>
>>>>>>>>> sender
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> immediately and permanently delete this email and
>>>>>>>>> any
>>>>>>>>>
>>>>>>>>> attachments. No
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> employee or agent of TiVo Inc. is authorized to
>>>>>>>>> conclude
>>>>>>>>>
>>>>>>>>> any binding
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> agreement on behalf of TiVo Inc. by email.
>>>>>>>>> Binding
>>>>>>>>>
>>>>>>>>> agreements with
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> TiVo
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> Inc. may only be made by a signed written
>>>>>>>>> agreement.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > --
>>>>>>>>>     >
>>>>>>>>>     > <http://www.openbet.com/> <http://www.openbet.com/>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     > *Michal Borowiecki*
>>>>>>>>>     >
>>>>>>>>>     > *Senior Software Engineer L4*
>>>>>>>>>     >
>>>>>>>>>     > *T: *
>>>>>>>>>     >
>>>>>>>>>     > +44 208 742 1600 <+44%2020%208742%201600> <+44%2020%208742%201600>
>>>>>>>>>     >
>>>>>>>>>     > +44 203 249 8448 <+44%2020%203249%208448> <+44%2020%203249%208448>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > *E: *
>>>>>>>>>     >
>>>>>>>>>     > michal.borowiecki@openbet.com
>>>>>>>>>     >
>>>>>>>>>     > *W: *
>>>>>>>>>     >
>>>>>>>>>     > www.openbet.com
>>>>>>>>>     >
>>>>>>>>>     > *OpenBet Ltd*
>>>>>>>>>     >
>>>>>>>>>     > Chiswick Park Building 9
>>>>>>>>>     >
>>>>>>>>>     > 566 Chiswick High Rd
>>>>>>>>>     >
>>>>>>>>>     > London
>>>>>>>>>     >
>>>>>>>>>     > W4 5XT
>>>>>>>>>     >
>>>>>>>>>     > UK
>>>>>>>>>     >
>>>>>>>>>     > <https://www.openbet.com/email_promo> <https://www.openbet.com/email_promo>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > This message is confidential and intended only for the
>>>>>>>>> addressee.
>>>>>>>>>
>>>>>>>>> If you
>>>>>>>>>
>>>>>>>>>     > have received this message in error, please immediately
>>>>>>>>> notify the
>>>>>>>>>     > postmaster@openbet.com and delete it from your system as
>>>>>>>>> well as
>>>>>>>>>
>>>>>>>>> any
>>>>>>>>>
>>>>>>>>>     > copies. The content of e-mails as well as traffic data may
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>> monitored by
>>>>>>>>>
>>>>>>>>>     > OpenBet for employment and security purposes. To protect
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> environment
>>>>>>>>>
>>>>>>>>>     > please do not print this e-mail unless necessary. OpenBet
>>>>>>>>> Ltd.
>>>>>>>>>
>>>>>>>>> Registered
>>>>>>>>>
>>>>>>>>>     > Office: Chiswick Park Building 9, 566 Chiswick High Road,
>>>>>>>>> London,
>>>>>>>>>
>>>>>>>>> W4 5XT,
>>>>>>>>>
>>>>>>>>>     > United Kingdom. A company registered in England and Wales.
>>>>>>>>>
>>>>>>>>> Registered no.
>>>>>>>>>
>>>>>>>>>     > 3134634. VAT no. GB927523612
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Tommy Becker
>>>>>>>>>
>>>>>>>>>     Senior Software Engineer
>>>>>>>>>
>>>>>>>>>     O +1 919.460.4747 <%28919%29%20460-4747>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     tivo.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>>
>>>>>>>>> This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> <http://www.openbet.com/> Michal Borowiecki
>>>>>>>>> Senior Software Engineer L4
>>>>>>>>> T: +44 208 742 1600 <+44%2020%208742%201600>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> -- 
>>>>>> Signature
>>>>>> <http://www.openbet.com/> 	Michal Borowiecki
>>>>>> Senior Software Engineer L4
>>>>>> 	T: 	+44 208 742 1600
>>>>>>
>>>>>> 	
>>>>>> 	+44 203 249 8448
>>>>>>
>>>>>> 	
>>>>>> 	 
>>>>>> 	E: 	michal.borowiecki@openbet.com
>>>>>> 	W: 	www.openbet.com <http://www.openbet.com/>
>>>>>>
>>>>>> 	
>>>>>> 	OpenBet Ltd
>>>>>>
>>>>>> 	Chiswick Park Building 9
>>>>>>
>>>>>> 	566 Chiswick High Rd
>>>>>>
>>>>>> 	London
>>>>>>
>>>>>> 	W4 5XT
>>>>>>
>>>>>> 	UK
>>>>>>
>>>>>> 	
>>>>>> <https://www.openbet.com/email_promo>
>>>>>>
>>>>>> This message is confidential and intended only for the addressee. If
>>>>>> you have received this message in error, please immediately notify the
>>>>>> postmaster@openbet.com <mailto:postmaster@openbet.com> and delete it
>>>>>> from your system as well as any copies. The content of e-mails as well
>>>>>> as traffic data may be monitored by OpenBet for employment and
>>>>>> security purposes. To protect the environment please do not print this
>>>>>> e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park
>>>>>> Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A
>>>>>> company registered in England and Wales. Registered no. 3134634. VAT
>>>>>> no. GB927523612
>>>>>>
>>>>> -- 
>>>>> Signature
>>>>> <http://www.openbet.com/> 	Michal Borowiecki
>>>>> Senior Software Engineer L4
>>>>> 	T: 	+44 208 742 1600
>>>>>
>>>>> 	
>>>>> 	+44 203 249 8448
>>>>>
>>>>> 	
>>>>> 	 
>>>>> 	E: 	michal.borowiecki@openbet.com
>>>>> 	W: 	www.openbet.com <http://www.openbet.com/>
>>>>>
>>>>> 	
>>>>> 	OpenBet Ltd
>>>>>
>>>>> 	Chiswick Park Building 9
>>>>>
>>>>> 	566 Chiswick High Rd
>>>>>
>>>>> 	London
>>>>>
>>>>> 	W4 5XT
>>>>>
>>>>> 	UK
>>>>>
>>>>> 	
>>>>> <https://www.openbet.com/email_promo>
>>>>>
>>>>> This message is confidential and intended only for the addressee. If you
>>>>> have received this message in error, please immediately notify the
>>>>> postmaster@openbet.com <mailto:postmaster@openbet.com> and delete it
>>>>> from your system as well as any copies. The content of e-mails as well
>>>>> as traffic data may be monitored by OpenBet for employment and security
>>>>> purposes. To protect the environment please do not print this e-mail
>>>>> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
>>>>> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
>>>>> registered in England and Wales. Registered no. 3134634. VAT no.
>>>>> GB927523612
>>>>>
>>> -- 
>>> Signature
>>> <http://www.openbet.com/> 	Michal Borowiecki
>>> Senior Software Engineer L4
>>> 	T: 	+44 208 742 1600
>>>
>>> 	
>>> 	+44 203 249 8448
>>>
>>> 	
>>> 	 
>>> 	E: 	michal.borowiecki@openbet.com
>>> 	W: 	www.openbet.com <http://www.openbet.com/>
>>>
>>> 	
>>> 	OpenBet Ltd
>>>
>>> 	Chiswick Park Building 9
>>>
>>> 	566 Chiswick High Rd
>>>
>>> 	London
>>>
>>> 	W4 5XT
>>>
>>> 	UK
>>>
>>> 	
>>> <https://www.openbet.com/email_promo>
>>>
>>> This message is confidential and intended only for the addressee. If you
>>> have received this message in error, please immediately notify the
>>> postmaster@openbet.com <mailto:postmaster@openbet.com> and delete it
>>> from your system as well as any copies. The content of e-mails as well
>>> as traffic data may be monitored by OpenBet for employment and security
>>> purposes. To protect the environment please do not print this e-mail
>>> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
>>> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
>>> registered in England and Wales. Registered no. 3134634. VAT no.
>>> GB927523612
>>>
> 
> -- 
> Signature
> <http://www.openbet.com/> 	Michal Borowiecki
> Senior Software Engineer L4
> 	T: 	+44 208 742 1600
> 
> 	
> 	+44 203 249 8448
> 
> 	
> 	 
> 	E: 	michal.borowiecki@openbet.com
> 	W: 	www.openbet.com <http://www.openbet.com/>
> 
> 	
> 	OpenBet Ltd
> 
> 	Chiswick Park Building 9
> 
> 	566 Chiswick High Rd
> 
> 	London
> 
> 	W4 5XT
> 
> 	UK
> 
> 	
> <https://www.openbet.com/email_promo>
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmaster@openbet.com <mailto:postmaster@openbet.com> and delete it
> from your system as well as any copies. The content of e-mails as well
> as traffic data may be monitored by OpenBet for employment and security
> purposes. To protect the environment please do not print this e-mail
> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> registered in England and Wales. Registered no. 3134634. VAT no.
> GB927523612
> 


Mime
View raw message