flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Allowed Lateness in Flink
Date Fri, 08 Jul 2016 08:00:37 GMT
@Chen I added a section at the end of the document regarding access to the
elements that are dropped as late. Right now, the section just mentions
that we have to do this but there is no real proposal yet for how to do it.
Only a rough sketch so that we don't forget about it.

On Fri, 8 Jul 2016 at 07:47 Chen Qin <qinnchen@gmail.com> wrote:

> +1 for allowedLateness scenario.
>
> The rationale behind is there are backfills or data issues hold in-window
> data till watermark pass end time. It cause sink write partial output.
>
> Allow high allowedLateness threshold makes life easier to merge those
> results and overwrite partial output with correct output at sink. But yeah,
> pipeline author is at risk of blow up statebackend with huge states.
>
> Alternatively, In some case, if sink allows read-check-merge operation,
> window can explicit call out events ingested after allowedLateness. It asks
> pipeline author mitigated these events in a way outside of flink ecosystem.
> The catch is that since everywhere in a flink job can replay and
> checkpoint, notification should somehow includes these info as well.
>
> Thanks
> Chen
>
> On Thu, Jul 7, 2016 at 12:14 AM, Kostas Kloudas <
> k.kloudas@data-artisans.com
> > wrote:
>
> > Hi,
> >
> > In the effort to move the discussion to the mailing list, rather than the
> > doc,
> > there was a comment in the doc:
> >
> > “It seems this proposal marries the allowed lateness of events and the
> > discarding of window state. In most use cases this should be sufficient,
> > but there are instances where having independent control of these may be
> > useful.
> >
> > For instance, you may have a job that computes some aggregate, like a
> sum.
> > You may want to keep the window state around for a while, but not too
> long.
> > Yet you may want to continue processing late events after you discarded
> the
> > window state. It is possible that your stream sinks can make use of this
> > data. For instance, they may be writing to a data store that returns an
> > error if a row already exists, which allow the sink to read the existing
> > row and update it with the new data."
> >
> > To which I would like to reply:
> >
> > If I understand your use-case correctly, I believe that the proposed
> > binding of the allowed lateness to the state purging does not impose any
> > problem. The lateness specifies the upper time bound, after which the
> state
> > will be discarded. Between the start of a window and its (end +
> > allowedLateness) you can write custom triggers that fire, purge the
> state,
> > or do nothing. Given this, I suppose that, at the most extreme case, you
> > can specify an allowed lateness of Long.MaxValue and do the purging of
> the
> > state "manually". By doing this, you remove the safeguard of letting the
> > system purge the state at some point in time, and you can do your own
> > custom state management that fits your needs.
> >
> > Thanks,
> > Kostas
> >
> > > On Jul 6, 2016, at 5:43 PM, Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> > >
> > > @Vishnu Funny you should ask that because I have a design doc lying
> > around.
> > > I'll open a new mail thread to not hijack this one.
> > >
> > > On Wed, 6 Jul 2016 at 17:17 Vishnu Viswanath <
> > vishnu.viswanath25@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I was going through the suggested improvements in window, and I have
> > >> few questions/suggestion on improvement regarding the Evictor.
> > >>
> > >> 1) I am having a use case where I have to create a custom Evictor that
> > will
> > >> evict elements from the window based on the value (e.g., if I have
> > elements
> > >> are of case class Item(id: Int, type:String) then evict elements that
> > has
> > >> type="a"). I believe this is not currently possible.
> > >> 2) this is somewhat related to 1) where there should be an option to
> > evict
> > >> elements from anywhere in the window. not only from the beginning of
> the
> > >> window. (e.g., apply the delta function to all elements and remove all
> > >> those don't pass. I checked the code and evict method just returns the
> > >> number of elements to be removed and processTriggerResult just skips
> > those
> > >> many elements from the beginning.
> > >> 3) Add an option to enables the user to decide if the eviction should
> > >> happen before the apply function or after the apply function.
> Currently
> > it
> > >> is before the apply function, but I have a use case where I need to
> > first
> > >> apply the function and evict afterward.
> > >>
> > >> I am doing these for a POC so I think I can modify the flink code base
> > to
> > >> make these changes and build, but I would appreciate any suggestion on
> > >> whether these are viable changes or will there any performance issue
> if
> > >> these are done. Also any pointer on where to start(e.g, do I create a
> > new
> > >> class similar to EvictingWindowOperator that extends WindowOperator?)
> > >>
> > >> Thanks and Regards,
> > >> Vishnu Viswanath,
> > >>
> > >> On Wed, Jul 6, 2016 at 9:39 AM, Aljoscha Krettek <aljoscha@apache.org
> >
> > >> wrote:
> > >>
> > >>> I did:
> > >>>
> > >>>
> > >>
> >
> https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3cCANMXwW0AbTTjjg9EWdxRUGxkjM7jscBeNmVRZOHPt2qO3pQMwA@mail.gmail.com%3e
> > >>> ;-)
> > >>>
> > >>> On Wed, 6 Jul 2016 at 15:31 Ufuk Celebi <uce@apache.org> wrote:
> > >>>
> > >>>> On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek <
> aljoscha@apache.org
> > >
> > >>>> wrote:
> > >>>>> In the future, it might be good to to discussions directly
on the
> ML
> > >>> and
> > >>>>> then change the document accordingly. This way everyone can
follow
> > >> the
> > >>>>> discussion on the ML. I also feel that Google Doc comments
often
> > >> don't
> > >>>> give
> > >>>>> enough space for expressing more complex opinions.
> > >>>>
> > >>>> I agree! Would you mind raising this point as a separate discussion
> on
> > >>> dev@
> > >>>> ?
> > >>>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message