flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Dixit <akshayd...@gmail.com>
Subject Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534
Date Thu, 26 Mar 2015 19:56:32 GMT
Thanks for going through it Gyula.
I've made the necessary amends to the timeline and submitted the proposal.

Regards,
Akshay Dixit


On Thu, Mar 26, 2015 at 8:53 PM, Gyula Fóra <gyfora@apache.org> wrote:

> I think it looks good for a start, we will have to work on the API a little
> bit together to make it fit smoothly with what we currently have.
>
> There is a few gaps in the timeline but that you have probably noticed :)
>
> Otherwise +1 from me.
>
> On Wed, Mar 25, 2015 at 11:35 PM, Akshay Dixit <akshaydixi@gmail.com>
> wrote:
>
> > Hi,
> > The link to the draft proposal that I've prepared is
> > https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31
> > It would be great if I could get some feedback on it.
> > Regards,
> > Akshay Dixit
> >
> > On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <akshaydixi@gmail.com>
> > wrote:
> >
> > > Thanks Gyula.
> > >
> > > I agree too that simple and working implementations are preferrable
> over
> > > hacky complex solutions. I'll start sketching out an initial
> > straighforward
> > > API with only basic pattern matching features
> > > and base it on the existing windowing API. I'll post a draft of the
> > > proposal,  keeping the points you've said in mind, tomorrow, so you can
> > > look it over to see if its all right.
> > > Regards,
> > > Akshay Dixit
> > >
> > > On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gyfora@apache.org> wrote:
> > >
> > >> Hey Dixit,
> > >>
> > >> Sorry for the delay, I had to discuss this in more detail with some of
> > our
> > >> other core developers.
> > >>
> > >> The consensus seems to be that we would like push this project in a
> > >> direction where the changes can be quickly included in the next
> > releases.
> > >> For this it is essential that we implement features that are complete
> > (and
> > >> clean) from the users perspective. This does not necessarily mean that
> > we
> > >> would like to have everything at once but rather that it is preferable
> > to
> > >> start with something clean and simple (for instance the naive chained
> > >> filter approach) and progressively build more complex logic.
> > >>
> > >> This also mean that we would like to avoid "researchy" code in the
> > >> codebase
> > >> as much as possible. Of course once we have a stable api for this
> > >> functionality we can work towards making the optimizations that you
> have
> > >> mentioned like operator sharing and so on.
> > >>
> > >> The ideal proposal would give a clear sketch of the pattern matching
> API
> > >> that you would like to implement, which might be some added operators
> at
> > >> first to the current API and possible a DSL later with more advanced
> > >> functionality (this would probably go in a separate library until it
> is
> > >> very stable).
> > >>
> > >> So please in the proposal include a preview of what the pattern
> matching
> > >> syntax would look like integrated with the current operators, how it
> > would
> > >> interact with other parts of the system etc.
> > >>
> > >> These are the thing we need to figure out before we consider the
> > >> optimizations I think, because it usually turns out, that the API
> > >> semantics
> > >> you would like to provide can hugely affect (probably limit) the
> > >> possibilities that you have afterwards in terms of optimizations.
> > >>
> > >> Let me know if you have further questions regarding this :)
> > >>
> > >> Gyula
> > >>
> > >> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gyfora@apache.org>
> wrote:
> > >>
> > >> > Hey,
> > >> >
> > >> > Give me an hour or so as I am in a meeting currently, but I will get
> > >> back
> > >> > to you afterwards.
> > >> >
> > >> > Regards,
> > >> > Gyula
> > >> >
> > >> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <
> akshaydixi@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >> It'd really help if I got a reply soon. It'll be helpful in writing
> > the
> > >> >> proposal since the deadline is on 27th. Thanks
> > >> >> Regards,
> > >> >> Akshay Dixit
> > >> >>
> > >> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <
> akshaydixi@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Thanks for the explanation Marton. I've decided to try out
for
> > >> >> FLINK-1534.
> > >> >> >
> > >> >> > After reading through the thesis[4] and a few other
> > papers[1][2][3],
> > >> I
> > >> >> > believe I've gathered a little context to ask more questions.
But
> > I'm
> > >> >> still
> > >> >> > not sure how Flink's internals work
> > >> >> > so please bear with me. Although the ongoing effort to document
> the
> > >> >> > architecture and internal is really helpful for newbies like
me
> and
> > >> >> would
> > >> >> > greatly decrease the ramping up time.
> > >> >> >
> > >> >> > Detecting a pattern of events would comprise of a pipeline
that
> > >> accepts
> > >> >> > the pattern query and
> > >> >> > sources of DataStreams, and outputs detected matches of that
> > pattern
> > >> to
> > >> >> a
> > >> >> > sink or forwards it
> > >> >> > along to another stream for further computation.
> > >> >> >
> > >> >> > As you said, a simple filter-join-aggregate query system
could be
> > >> >> > developed implementing using the existing Streaming windowing
> API.
> > >> >> > But matching over complex events and decoding their pattern
> queries
> > >> >> would
> > >> >> > require implementing a DSL that transforms queries into an
> > evaluation
> > >> >> > model. For e.g,
> > >> >> > in [1], the authors have implemented an NFA automaton with
a
> shared
> > >> >> > versioned buffer that models the queries. In [4], the authors
> > >> >> > propose a new language that is much more expressive and compiles
> > >> into a
> > >> >> > topology graph for Storm.
> > >> >> >
> > >> >> > So in Flink's case, I believe the proposed DSL would generate
> > >> operator
> > >> >> > graphs for the Flink compiler to schedule Jobgraphs over
> > >> TaskManagers.
> > >> >> > If we don't depend on the Windowing API, would we need to
create
> > new
> > >> >> > operators such as the Projection, Conjunction and Union operators
> > >> >> defined
> > >> >> > in [4] ?
> > >> >> > Also I would like to hear your thoughts on how to approach
> scaling
> > >> the
> > >> >> > pattern matching query. Note all these techniques talk about
> > scaling
> > >> a
> > >> >> > single query.
> > >> >> > I've read various ways such as
> > >> >> >
> > >> >> > 1.  Merging equivalent runs[1] -: This seems a good way to
squash
> > >> >> multiple
> > >> >> > instances of pattern matching forks into a single one if
they
> have
> > >> the
> > >> >> same
> > >> >> > state.
> > >> >> > But I'm not sure how we would implement this in Flink since
this
> > is a
> > >> >> > runtime optimization.
> > >> >> >
> > >> >> > 2.  Implementing a matched version buffer[1] -: This would
> involve
> > >> >> sharing
> > >> >> > state of a buffer datastructure across multiple candidate
match
> > >> >> instances
> > >> >> > for the pattern.
> > >> >> >
> > >> >> > 3.  Splitting complex composite patterns into simpler
> > sub-patterns[4]
> > >> >> and
> > >> >> > executing separate queries to detect those sub-patterns.
This
> might
> > >> >> > translate into different
> > >> >> > tasks and duplicating the source datastreams to all the new
> > generated
> > >> >> > tasks.
> > >> >> >
> > >> >> > Also since I don't know how the Flink compiler behaves, would
> some
> > of
> > >> >> the
> > >> >> > optimizations involve making changes to it too?
> > >> >> >
> > >> >> > Regards,
> > >> >> > Akshay Dixit
> > >> >> >
> > >> >> > [1] : Efficient Pattern Matching over Event Streams
> > >> >> > <
> > >>
> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
> > >> >> >
> > >> >> > [2] : On Supporting Kleene Closure over Event Streams
> > >> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf
> >
> > >> >> > [3] : Processing Flows of Information: From Data Stream to
> Complex
> > >> Event
> > >> >> > Processing
> > >> >> > <
> > >> >>
> > >>
> >
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> > >> >> >
> > >> >> > [4] : Distributing Complex Event Detection
> > >> >> > <
> > >> >>
> > >>
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> > >
> > >> >> >
> > >> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> > >> >> balassi.marton@gmail.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Dear Akshay,
> > >> >> >>
> > >> >> >> Thanks again for your interest and for the recent contribution
> to
> > >> >> >> streaming.
> > >> >> >>
> > >> >> >> Both of the projects mentioned wold be largely appreciated
by
> the
> > >> >> >> community, and you can also propose other project suggestions
> here
> > >> for
> > >> >> >> discussion.
> > >> >> >>
> > >> >> >> Regarding FLINK-1534, the thesis I mentioned serves as
a
> starting
> > >> point
> > >> >> >> and
> > >> >> >> indeed the basic solution can be implemented with filtering
and
> > >> >> >> windowing/mapping with some state storing whether the
cause of
> an
> > >> event
> > >> >> >> has
> > >> >> >> been already seen. Solely relying on the now existing
windowing
> > API
> > >> >> this
> > >> >> >> however might cause performance issues if the events
also have
> an
> > >> >> >> expiration timeout - some optimization there would be
included.
> > The
> > >> >> >> further
> > >> >> >> challenge is to try to further exploit the parallel job
> execution
> > of
> > >> >> Flink
> > >> >> >> to possibly scale a pattern matching query.
> > >> >> >>
> > >> >> >> Best,
> > >> >> >>
> > >> >> >> Marton
> > >> >> >>
> > >> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <
> > akshaydixi@gmail.com
> > >> >
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >> > Hi,
> > >> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT
Vellore,
> India.
> > >> I'm
> > >> >> >> > currently interested in distributed systems and
stream
> > processing
> > >> >> and am
> > >> >> >> > looking to delve deeper into the subject, and hope
to get some
> > >> >> insight
> > >> >> >> by
> > >> >> >> > contributing to Apache Flink. I've gathered some
idea of the
> > >> >> >> > flink-streaming codebase by recently working on
a PR for
> > >> >> FLINK-1450[2].
> > >> >> >> >
> > >> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting
projects
> > >> that I
> > >> >> >> would
> > >> >> >> > love to work on over the summer. I was wondering
which amongst
> > >> these
> > >> >> >> would
> > >> >> >> > be more appreciated by the community, so I can start
working
> > >> towards
> > >> >> a
> > >> >> >> > proposal for either one.
> > >> >> >> >
> > >> >> >> > Regarding FLINK-1534, I was wondering why would
simply merging
> > and
> > >> >> >> > filtering the existing streams for events we want
to detect
> not
> > >> work?
> > >> >> >> Also
> > >> >> >> > on going through the document mentioned by @mbalassi
in the
> JIRA
> > >> >> >> > comment[5], the authors specify some Runtime Event
Detection
> > >> >> concepts in
> > >> >> >> > Section 5.2. I'm assuming the project entails on
building a
> > >> similar
> > >> >> >> analogy
> > >> >> >> > using Flink and the deliverables would include working
pattern
> > >> >> matching
> > >> >> >> > operators over Flink DataStreams as described in
the report.
> If
> > >> so,
> > >> >> then
> > >> >> >> > shouldn't it be trivial to implement the described
the Binary
> > >> >> operator
> > >> >> >> > using a WindowedStream and a Filter?
> > >> >> >> > I hope my questions don't seem misplaced here and
I would
> > >> appreciate
> > >> >> >> links
> > >> >> >> > to literature where I can learn more on the topic.
> > >> >> >> >
> > >> >> >> > Regards,
> > >> >> >> > Akshay Dixit
> > >> >> >> >
> > >> >> >> > [1] : http://akshaydixi.me
> > >> >> >> > [2] : https://github.com/apache/flink/pull/481
> > >> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> > >> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> > >> >> >> > [5] :
> > >> >> >> >
> > >> >>
> > >>
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> > >> >> >> >
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message