flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Improving Trigger/Window API and Semantics
Date Tue, 22 Mar 2016 08:52:39 GMT
I have some thoughts about Evictors as well yes, but I didn’t yet write them down. The basic
idea about them is this:

class Evictor {
   Predicate getPredicate(Iterable<StreamRecord<T>> elements, int size, W window);

class Predicate {
  boolean evict(StreamRecord<T> element);

The evictor will return a predicate that is evaluated on every element in the buffer to decide
whether we should keep it or not. The predicate can keep internal state. So with the size
it gets in getPredicate() it can do count based eviction (just evict elements until you reach
your desired quota). We can also do eviction based on event-time which was not possible before
because you could only evict from the start of the buffer. What do you think?

> On 22 Mar 2016, at 09:24, Fabian Hueske <fhueske@gmail.com> wrote:
> Thanks for the write-up Aljoscha.
> I think it is a really good idea to separate the different aspects (fire, purging, lateness)
a bit. At the moment, all of these need to be handled in the Trigger and a custom trigger
is necessary whenever, you want some of these aspects slightly differently handled. This makes
the Trigger interface and implementations of it really hard to understand.
> +1 for the suggested changes. 
> Are there plans to touch the Evictor interface as well? IMO, this needs a redesign as
> Fabian
> 2016-03-21 19:21 GMT+01:00 Aljoscha Krettek <aljoscha@apache.org>:
> Hi,
> my previous message might be a bit hard to parse for people that are not very deep into
the Trigger implementation. So I’ll try to give a bit more explanation right in the mail.
> The basic idea is that we observed some basic problems that keep coming up for people
on the mailing lists and I want to try and address them.
> The first problem is with the Trigger semantics and the confusion between triggers that
purge the window contents and those that don’t. (For example, using a ContinuousEventTimeTrigger
with EventTimeWindows assigner is a bad idea because state will be kept indefinitely.) While
working on this we should also tacke the issue of providing composite triggers such as Repeatedly
(fires a child-trigger repeatedly), Any (fires when any child trigger fires) and All (fires
when all child triggers fire).
> Lateness. Right now, it is possible to write custom triggers that can deal with late
elements and can even behave differently based on the amount of lateness. There is, however,
no API for dealing with lateness. We should address this.
> The third issue is Trigger testability. We should introduce a testing harness for triggers
and move the processing time triggers to use a clock provider instead of directly using System.currentTimeMillis().
This will allow testing them deterministically.
> All of these are expanded upon in the document I linked to before: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing
I think all of this is very important for people working on event-time based pipelines.
> Feedback is very welcome and I hope that we can expand the document together and come
up with good solutions.
> Cheers,
> Aljoscha
> > On 21 Mar 2016, at 17:46, Aljoscha Krettek <aljoscha@apache.org> wrote:
> >
> > Hi,
> > I’m also sending this to @user because the Trigger API concerns users directly.
> >
> > There are some things in the Trigger API that I think require some improvements.
The issues are trigger testability, fire semantics and composite triggers and lateness. I
started a document to keep track of things (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing).
Please read it if you are interested and want to get involved in this. We’ll evolve the
document together and come up with Jira issues for the subtasks.
> >
> > Cheers,
> > Aljoscha

View raw message