apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJAY GUPTA <ajaygit...@gmail.com>
Subject Re: Watermark tuples in Apex
Date Fri, 10 Mar 2017 12:21:20 GMT
Hi Bhupesh,

For point 1, cant we make use of implicitWatermarkGenerator?


On Wed, Mar 8, 2017 at 12:16 PM, Bhupesh Chawda <bhupesh@apache.org> wrote:

> Hi All,
> Watermark tuples in Apex are very tightly coupled to event time processing.
> For this reason, usually they are modeled as having a timestamp.
> public interface WatermarkTuple
> {
>   long getTimestamp();
> }
> Even though, watermarks are meant for such time related processing, I think
> we should expand the concept of watermarks for the following types:
> 1. Labelled watermarks
> This could be useful in scenarios where instead of a timestamp (which is an
> ordered field), we have categorical values. For example, consider tuples
> which are labeled by city names. For each city, we want to have separate
> windows and isolate the processing. If the watermark returns a different
> city name, we end the previous window and start a new one. Or, in this case
> we could make use of both high and low watermarks indicating the start and
> end of a city's data. This could mean having multiple windows' data
> incoming at the same time.
> 2. Ordered watermarks
> Instead of having the ordered field as time, why not consider something
> like an Ordered Watermark. TimeBased Watermarks could extend from that.
> An ordered watermark could be used in case we have a sequence of data
> tuples and we need to demarcate every n tuples. Even though we can say that
> every n tuples the window is definitely closed, but the decision is made
> only when the upstream sends the watermark tuple. The windowed operator
> does not have any clue about it. It blindly opens and closes windows based
> on watermarks received from upstream. This could mean different windows may
> have different values of n.
> Please let me know your thoughts on this.
> ~ Bhupesh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message