flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyf...@apache.org>
Subject Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534
Date Tue, 24 Mar 2015 13:00:32 GMT
Hey Dixit,

Sorry for the delay, I had to discuss this in more detail with some of our
other core developers.

The consensus seems to be that we would like push this project in a
direction where the changes can be quickly included in the next releases.
For this it is essential that we implement features that are complete (and
clean) from the users perspective. This does not necessarily mean that we
would like to have everything at once but rather that it is preferable to
start with something clean and simple (for instance the naive chained
filter approach) and progressively build more complex logic.

This also mean that we would like to avoid "researchy" code in the codebase
as much as possible. Of course once we have a stable api for this
functionality we can work towards making the optimizations that you have
mentioned like operator sharing and so on.

The ideal proposal would give a clear sketch of the pattern matching API
that you would like to implement, which might be some added operators at
first to the current API and possible a DSL later with more advanced
functionality (this would probably go in a separate library until it is
very stable).

So please in the proposal include a preview of what the pattern matching
syntax would look like integrated with the current operators, how it would
interact with other parts of the system etc.

These are the thing we need to figure out before we consider the
optimizations I think, because it usually turns out, that the API semantics
you would like to provide can hugely affect (probably limit) the
possibilities that you have afterwards in terms of optimizations.

Let me know if you have further questions regarding this :)

Gyula

On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gyfora@apache.org> wrote:

> Hey,
>
> Give me an hour or so as I am in a meeting currently, but I will get back
> to you afterwards.
>
> Regards,
> Gyula
>
> On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <akshaydixi@gmail.com>
> wrote:
>
>> Hi,
>> It'd really help if I got a reply soon. It'll be helpful in writing the
>> proposal since the deadline is on 27th. Thanks
>> Regards,
>> Akshay Dixit
>>
>> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshaydixi@gmail.com>
>> wrote:
>>
>> > Thanks for the explanation Marton. I've decided to try out for
>> FLINK-1534.
>> >
>> > After reading through the thesis[4] and a few other papers[1][2][3], I
>> > believe I've gathered a little context to ask more questions. But I'm
>> still
>> > not sure how Flink's internals work
>> > so please bear with me. Although the ongoing effort to document the
>> > architecture and internal is really helpful for newbies like me and
>> would
>> > greatly decrease the ramping up time.
>> >
>> > Detecting a pattern of events would comprise of a pipeline that accepts
>> > the pattern query and
>> > sources of DataStreams, and outputs detected matches of that pattern to
>> a
>> > sink or forwards it
>> > along to another stream for further computation.
>> >
>> > As you said, a simple filter-join-aggregate query system could be
>> > developed implementing using the existing Streaming windowing API.
>> > But matching over complex events and decoding their pattern queries
>> would
>> > require implementing a DSL that transforms queries into an evaluation
>> > model. For e.g,
>> > in [1], the authors have implemented an NFA automaton with a shared
>> > versioned buffer that models the queries. In [4], the authors
>> > propose a new language that is much more expressive and compiles into a
>> > topology graph for Storm.
>> >
>> > So in Flink's case, I believe the proposed DSL would generate operator
>> > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
>> > If we don't depend on the Windowing API, would we need to create new
>> > operators such as the Projection, Conjunction and Union operators
>> defined
>> > in [4] ?
>> > Also I would like to hear your thoughts on how to approach scaling the
>> > pattern matching query. Note all these techniques talk about scaling a
>> > single query.
>> > I've read various ways such as
>> >
>> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
>> multiple
>> > instances of pattern matching forks into a single one if they have the
>> same
>> > state.
>> > But I'm not sure how we would implement this in Flink since this is a
>> > runtime optimization.
>> >
>> > 2.  Implementing a matched version buffer[1] -: This would involve
>> sharing
>> > state of a buffer datastructure across multiple candidate match
>> instances
>> > for the pattern.
>> >
>> > 3.  Splitting complex composite patterns into simpler sub-patterns[4]
>> and
>> > executing separate queries to detect those sub-patterns. This might
>> > translate into different
>> > tasks and duplicating the source datastreams to all the new generated
>> > tasks.
>> >
>> > Also since I don't know how the Flink compiler behaves, would some of
>> the
>> > optimizations involve making changes to it too?
>> >
>> > Regards,
>> > Akshay Dixit
>> >
>> > [1] : Efficient Pattern Matching over Event Streams
>> > <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
>> >
>> > [2] : On Supporting Kleene Closure over Event Streams
>> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
>> > [3] : Processing Flows of Information: From Data Stream to Complex Event
>> > Processing
>> > <
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
>> >
>> > [4] : Distributing Complex Event Detection
>> > <
>> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf>
>> >
>> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
>> balassi.marton@gmail.com>
>> > wrote:
>> >
>> >> Dear Akshay,
>> >>
>> >> Thanks again for your interest and for the recent contribution to
>> >> streaming.
>> >>
>> >> Both of the projects mentioned wold be largely appreciated by the
>> >> community, and you can also propose other project suggestions here for
>> >> discussion.
>> >>
>> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting point
>> >> and
>> >> indeed the basic solution can be implemented with filtering and
>> >> windowing/mapping with some state storing whether the cause of an event
>> >> has
>> >> been already seen. Solely relying on the now existing windowing API
>> this
>> >> however might cause performance issues if the events also have an
>> >> expiration timeout - some optimization there would be included. The
>> >> further
>> >> challenge is to try to further exploit the parallel job execution of
>> Flink
>> >> to possibly scale a pattern matching query.
>> >>
>> >> Best,
>> >>
>> >> Marton
>> >>
>> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshaydixi@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi,
>> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
>> >> > currently interested in distributed systems and stream processing
>> and am
>> >> > looking to delve deeper into the subject, and hope to get some
>> insight
>> >> by
>> >> > contributing to Apache Flink. I've gathered some idea of the
>> >> > flink-streaming codebase by recently working on a PR for
>> FLINK-1450[2].
>> >> >
>> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that
I
>> >> would
>> >> > love to work on over the summer. I was wondering which amongst these
>> >> would
>> >> > be more appreciated by the community, so I can start working towards
>> a
>> >> > proposal for either one.
>> >> >
>> >> > Regarding FLINK-1534, I was wondering why would simply merging and
>> >> > filtering the existing streams for events we want to detect not work?
>> >> Also
>> >> > on going through the document mentioned by @mbalassi in the JIRA
>> >> > comment[5], the authors specify some Runtime Event Detection
>> concepts in
>> >> > Section 5.2. I'm assuming the project entails on building a similar
>> >> analogy
>> >> > using Flink and the deliverables would include working pattern
>> matching
>> >> > operators over Flink DataStreams as described in the report. If so,
>> then
>> >> > shouldn't it be trivial to implement the described the Binary
>> operator
>> >> > using a WindowedStream and a Filter?
>> >> > I hope my questions don't seem misplaced here and I would appreciate
>> >> links
>> >> > to literature where I can learn more on the topic.
>> >> >
>> >> > Regards,
>> >> > Akshay Dixit
>> >> >
>> >> > [1] : http://akshaydixi.me
>> >> > [2] : https://github.com/apache/flink/pull/481
>> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
>> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
>> >> > [5] :
>> >> >
>> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
>> >> >
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message