flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyf...@apache.org>
Subject Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534
Date Thu, 26 Mar 2015 15:23:30 GMT
I think it looks good for a start, we will have to work on the API a little
bit together to make it fit smoothly with what we currently have.

There is a few gaps in the timeline but that you have probably noticed :)

Otherwise +1 from me.

On Wed, Mar 25, 2015 at 11:35 PM, Akshay Dixit <akshaydixi@gmail.com> wrote:

> Hi,
> The link to the draft proposal that I've prepared is
> https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31
> It would be great if I could get some feedback on it.
> Regards,
> Akshay Dixit
>
> On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <akshaydixi@gmail.com>
> wrote:
>
> > Thanks Gyula.
> >
> > I agree too that simple and working implementations are preferrable over
> > hacky complex solutions. I'll start sketching out an initial
> straighforward
> > API with only basic pattern matching features
> > and base it on the existing windowing API. I'll post a draft of the
> > proposal,  keeping the points you've said in mind, tomorrow, so you can
> > look it over to see if its all right.
> > Regards,
> > Akshay Dixit
> >
> > On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gyfora@apache.org> wrote:
> >
> >> Hey Dixit,
> >>
> >> Sorry for the delay, I had to discuss this in more detail with some of
> our
> >> other core developers.
> >>
> >> The consensus seems to be that we would like push this project in a
> >> direction where the changes can be quickly included in the next
> releases.
> >> For this it is essential that we implement features that are complete
> (and
> >> clean) from the users perspective. This does not necessarily mean that
> we
> >> would like to have everything at once but rather that it is preferable
> to
> >> start with something clean and simple (for instance the naive chained
> >> filter approach) and progressively build more complex logic.
> >>
> >> This also mean that we would like to avoid "researchy" code in the
> >> codebase
> >> as much as possible. Of course once we have a stable api for this
> >> functionality we can work towards making the optimizations that you have
> >> mentioned like operator sharing and so on.
> >>
> >> The ideal proposal would give a clear sketch of the pattern matching API
> >> that you would like to implement, which might be some added operators at
> >> first to the current API and possible a DSL later with more advanced
> >> functionality (this would probably go in a separate library until it is
> >> very stable).
> >>
> >> So please in the proposal include a preview of what the pattern matching
> >> syntax would look like integrated with the current operators, how it
> would
> >> interact with other parts of the system etc.
> >>
> >> These are the thing we need to figure out before we consider the
> >> optimizations I think, because it usually turns out, that the API
> >> semantics
> >> you would like to provide can hugely affect (probably limit) the
> >> possibilities that you have afterwards in terms of optimizations.
> >>
> >> Let me know if you have further questions regarding this :)
> >>
> >> Gyula
> >>
> >> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gyfora@apache.org> wrote:
> >>
> >> > Hey,
> >> >
> >> > Give me an hour or so as I am in a meeting currently, but I will get
> >> back
> >> > to you afterwards.
> >> >
> >> > Regards,
> >> > Gyula
> >> >
> >> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <akshaydixi@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >> It'd really help if I got a reply soon. It'll be helpful in writing
> the
> >> >> proposal since the deadline is on 27th. Thanks
> >> >> Regards,
> >> >> Akshay Dixit
> >> >>
> >> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshaydixi@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Thanks for the explanation Marton. I've decided to try out for
> >> >> FLINK-1534.
> >> >> >
> >> >> > After reading through the thesis[4] and a few other
> papers[1][2][3],
> >> I
> >> >> > believe I've gathered a little context to ask more questions.
But
> I'm
> >> >> still
> >> >> > not sure how Flink's internals work
> >> >> > so please bear with me. Although the ongoing effort to document
the
> >> >> > architecture and internal is really helpful for newbies like me
and
> >> >> would
> >> >> > greatly decrease the ramping up time.
> >> >> >
> >> >> > Detecting a pattern of events would comprise of a pipeline that
> >> accepts
> >> >> > the pattern query and
> >> >> > sources of DataStreams, and outputs detected matches of that
> pattern
> >> to
> >> >> a
> >> >> > sink or forwards it
> >> >> > along to another stream for further computation.
> >> >> >
> >> >> > As you said, a simple filter-join-aggregate query system could
be
> >> >> > developed implementing using the existing Streaming windowing
API.
> >> >> > But matching over complex events and decoding their pattern queries
> >> >> would
> >> >> > require implementing a DSL that transforms queries into an
> evaluation
> >> >> > model. For e.g,
> >> >> > in [1], the authors have implemented an NFA automaton with a shared
> >> >> > versioned buffer that models the queries. In [4], the authors
> >> >> > propose a new language that is much more expressive and compiles
> >> into a
> >> >> > topology graph for Storm.
> >> >> >
> >> >> > So in Flink's case, I believe the proposed DSL would generate
> >> operator
> >> >> > graphs for the Flink compiler to schedule Jobgraphs over
> >> TaskManagers.
> >> >> > If we don't depend on the Windowing API, would we need to create
> new
> >> >> > operators such as the Projection, Conjunction and Union operators
> >> >> defined
> >> >> > in [4] ?
> >> >> > Also I would like to hear your thoughts on how to approach scaling
> >> the
> >> >> > pattern matching query. Note all these techniques talk about
> scaling
> >> a
> >> >> > single query.
> >> >> > I've read various ways such as
> >> >> >
> >> >> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> >> >> multiple
> >> >> > instances of pattern matching forks into a single one if they
have
> >> the
> >> >> same
> >> >> > state.
> >> >> > But I'm not sure how we would implement this in Flink since this
> is a
> >> >> > runtime optimization.
> >> >> >
> >> >> > 2.  Implementing a matched version buffer[1] -: This would involve
> >> >> sharing
> >> >> > state of a buffer datastructure across multiple candidate match
> >> >> instances
> >> >> > for the pattern.
> >> >> >
> >> >> > 3.  Splitting complex composite patterns into simpler
> sub-patterns[4]
> >> >> and
> >> >> > executing separate queries to detect those sub-patterns. This
might
> >> >> > translate into different
> >> >> > tasks and duplicating the source datastreams to all the new
> generated
> >> >> > tasks.
> >> >> >
> >> >> > Also since I don't know how the Flink compiler behaves, would
some
> of
> >> >> the
> >> >> > optimizations involve making changes to it too?
> >> >> >
> >> >> > Regards,
> >> >> > Akshay Dixit
> >> >> >
> >> >> > [1] : Efficient Pattern Matching over Event Streams
> >> >> > <
> >> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
> >> >> >
> >> >> > [2] : On Supporting Kleene Closure over Event Streams
> >> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> >> >> > [3] : Processing Flows of Information: From Data Stream to Complex
> >> Event
> >> >> > Processing
> >> >> > <
> >> >>
> >>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> >> >> >
> >> >> > [4] : Distributing Complex Event Detection
> >> >> > <
> >> >>
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
> >> >> >
> >> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> >> >> balassi.marton@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Dear Akshay,
> >> >> >>
> >> >> >> Thanks again for your interest and for the recent contribution
to
> >> >> >> streaming.
> >> >> >>
> >> >> >> Both of the projects mentioned wold be largely appreciated
by the
> >> >> >> community, and you can also propose other project suggestions
here
> >> for
> >> >> >> discussion.
> >> >> >>
> >> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting
> >> point
> >> >> >> and
> >> >> >> indeed the basic solution can be implemented with filtering
and
> >> >> >> windowing/mapping with some state storing whether the cause
of an
> >> event
> >> >> >> has
> >> >> >> been already seen. Solely relying on the now existing windowing
> API
> >> >> this
> >> >> >> however might cause performance issues if the events also
have an
> >> >> >> expiration timeout - some optimization there would be included.
> The
> >> >> >> further
> >> >> >> challenge is to try to further exploit the parallel job execution
> of
> >> >> Flink
> >> >> >> to possibly scale a pattern matching query.
> >> >> >>
> >> >> >> Best,
> >> >> >>
> >> >> >> Marton
> >> >> >>
> >> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <
> akshaydixi@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >>
> >> >> >> > Hi,
> >> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore,
India.
> >> I'm
> >> >> >> > currently interested in distributed systems and stream
> processing
> >> >> and am
> >> >> >> > looking to delve deeper into the subject, and hope to
get some
> >> >> insight
> >> >> >> by
> >> >> >> > contributing to Apache Flink. I've gathered some idea
of the
> >> >> >> > flink-streaming codebase by recently working on a PR
for
> >> >> FLINK-1450[2].
> >> >> >> >
> >> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting
projects
> >> that I
> >> >> >> would
> >> >> >> > love to work on over the summer. I was wondering which
amongst
> >> these
> >> >> >> would
> >> >> >> > be more appreciated by the community, so I can start
working
> >> towards
> >> >> a
> >> >> >> > proposal for either one.
> >> >> >> >
> >> >> >> > Regarding FLINK-1534, I was wondering why would simply
merging
> and
> >> >> >> > filtering the existing streams for events we want to
detect not
> >> work?
> >> >> >> Also
> >> >> >> > on going through the document mentioned by @mbalassi
in the JIRA
> >> >> >> > comment[5], the authors specify some Runtime Event Detection
> >> >> concepts in
> >> >> >> > Section 5.2. I'm assuming the project entails on building
a
> >> similar
> >> >> >> analogy
> >> >> >> > using Flink and the deliverables would include working
pattern
> >> >> matching
> >> >> >> > operators over Flink DataStreams as described in the
report. If
> >> so,
> >> >> then
> >> >> >> > shouldn't it be trivial to implement the described the
Binary
> >> >> operator
> >> >> >> > using a WindowedStream and a Filter?
> >> >> >> > I hope my questions don't seem misplaced here and I would
> >> appreciate
> >> >> >> links
> >> >> >> > to literature where I can learn more on the topic.
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> > Akshay Dixit
> >> >> >> >
> >> >> >> > [1] : http://akshaydixi.me
> >> >> >> > [2] : https://github.com/apache/flink/pull/481
> >> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> >> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> >> >> >> > [5] :
> >> >> >> >
> >> >>
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message