flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: [DISCUSS] Allowed Lateness in Flink
Date Thu, 07 Jul 2016 07:14:27 GMT

In the effort to move the discussion to the mailing list, rather than the doc, 
there was a comment in the doc:

“It seems this proposal marries the allowed lateness of events and the discarding of window
state. In most use cases this should be sufficient, but there are instances where having independent
control of these may be useful.

For instance, you may have a job that computes some aggregate, like a sum. You may want to
keep the window state around for a while, but not too long. Yet you may want to continue processing
late events after you discarded the window state. It is possible that your stream sinks can
make use of this data. For instance, they may be writing to a data store that returns an error
if a row already exists, which allow the sink to read the existing row and update it with
the new data."

To which I would like to reply:

If I understand your use-case correctly, I believe that the proposed binding of the allowed
lateness to the state purging does not impose any problem. The lateness specifies the upper
time bound, after which the state will be discarded. Between the start of a window and its
(end + allowedLateness) you can write custom triggers that fire, purge the state, or do nothing.
Given this, I suppose that, at the most extreme case, you can specify an allowed lateness
of Long.MaxValue and do the purging of the state "manually". By doing this, you remove the
safeguard of letting the system purge the state at some point in time, and you can do your
own custom state management that fits your needs.


> On Jul 6, 2016, at 5:43 PM, Aljoscha Krettek <aljoscha@apache.org> wrote:
> @Vishnu Funny you should ask that because I have a design doc lying around.
> I'll open a new mail thread to not hijack this one.
> On Wed, 6 Jul 2016 at 17:17 Vishnu Viswanath <vishnu.viswanath25@gmail.com>
> wrote:
>> Hi,
>> I was going through the suggested improvements in window, and I have
>> few questions/suggestion on improvement regarding the Evictor.
>> 1) I am having a use case where I have to create a custom Evictor that will
>> evict elements from the window based on the value (e.g., if I have elements
>> are of case class Item(id: Int, type:String) then evict elements that has
>> type="a"). I believe this is not currently possible.
>> 2) this is somewhat related to 1) where there should be an option to evict
>> elements from anywhere in the window. not only from the beginning of the
>> window. (e.g., apply the delta function to all elements and remove all
>> those don't pass. I checked the code and evict method just returns the
>> number of elements to be removed and processTriggerResult just skips those
>> many elements from the beginning.
>> 3) Add an option to enables the user to decide if the eviction should
>> happen before the apply function or after the apply function. Currently it
>> is before the apply function, but I have a use case where I need to first
>> apply the function and evict afterward.
>> I am doing these for a POC so I think I can modify the flink code base to
>> make these changes and build, but I would appreciate any suggestion on
>> whether these are viable changes or will there any performance issue if
>> these are done. Also any pointer on where to start(e.g, do I create a new
>> class similar to EvictingWindowOperator that extends WindowOperator?)
>> Thanks and Regards,
>> Vishnu Viswanath,
>> On Wed, Jul 6, 2016 at 9:39 AM, Aljoscha Krettek <aljoscha@apache.org>
>> wrote:
>>> I did:
>> https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3cCANMXwW0AbTTjjg9EWdxRUGxkjM7jscBeNmVRZOHPt2qO3pQMwA@mail.gmail.com%3e
>>> ;-)
>>> On Wed, 6 Jul 2016 at 15:31 Ufuk Celebi <uce@apache.org> wrote:
>>>> On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek <aljoscha@apache.org>
>>>> wrote:
>>>>> In the future, it might be good to to discussions directly on the ML
>>> and
>>>>> then change the document accordingly. This way everyone can follow
>> the
>>>>> discussion on the ML. I also feel that Google Doc comments often
>> don't
>>>> give
>>>>> enough space for expressing more complex opinions.
>>>> I agree! Would you mind raising this point as a separate discussion on
>>> dev@
>>>> ?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message