flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wepngong Benaiah <bwepng...@gmail.com>
Subject Re: GSoC project proposal: Query optimisation layer for Flink Streaming
Date Wed, 25 Mar 2015 00:47:24 GMT
Thanks for your comments @rmetzger, @mbalassi. I will do necessary
corrections and put it up again for review.

On Tue, Mar 24, 2015 at 10:01 PM, Robert Metzger <rmetzger@apache.org>
wrote:

> Just a quick ping on this for the streaming folks: The deadline for the
> proposal submissions is Friday, so the GSoC applicants need to get our
> feedback asap.
> The student asked me today in the #flink channel whether we can review
> this proposal.
>
>
> I have the following comments regarding the proposal:
> - I don't exactly understand how you've chosen the dates for the
> milestones. According to
> https://www.google-melange.com/gsoc/events/google/gsoc2015 the coding
> phase begins at 25 May and ends on 21 August. It seems that you are
> suggesting to start with the implementation before the offical GSoC start
> date.
> I would suggest to align the milestones with the official GSoC timeline
> (or at least justify in the proposal why you're deviating from that)
> - Can you explain a bit more how you are planning to do operator
> reordering and how the "rete algorithm" is working. Also some background on
> why you've chosen that algorithm would be helpful.
>
>
>
> On Sun, Mar 22, 2015 at 4:44 AM, Wepngong Benaiah <bwepngong@gmail.com>
> wrote:
>
>> Hello,
>> I cam out with the following proposal which I believe needs alot of
>> review. I will appreciate if you can help me make appropriate corrections
>> before the deadline for submission.
>> Thanks @gyfora, @pariscarbone
>>
>>
>> *GSoC project: Query optimisation layer for Flink Streaming
>> <https://issues.apache.org/jira/browse/FLINK-1617>*
>>
>> NAME: Wepngong Ngeh Benaiah
>>
>> EMAIL: bwepngong@gmail.com
>>
>> *SYNOPSIS*
>>
>> I would very much like to participate for GSOC2015 with Apache
>> <http://apache.org/> working with Flink <http://flink.org/> streaming
as
>> my way of contributing to open-source.
>>
>> Flink streaming currently only supports a limited set of optimisations
>> applied on the streaming programs such as *operator chaining*, and
>> several optimisations for *windowing* *computations*.
>>
>> Also, there is currently no optimizer as a separate module on its own.
>> Though *operator chaining *improves performance, alot more has to be
>> done to further improve system performance.
>>
>> My project will be to implement a *Query Optimisation layer for Flink
>> Streaming. *This is supposed to do statistical graph analysis and
>> streaming graph optimization. This would bring major system performance
>> improvements.
>>
>> *H**OW **WOULD **THE COMMUNITY **BENEFIT FROM THIS**?*
>>
>> Much of “big data” is received in real time, and is most valuable at its
>> time of arrival. For example, a social network may want to identify
>> trending conversation topics within minutes, an ad provider may want to
>> train a model of which users click a new ad, and a service operator may
>> want to mine log files to detect failures within seconds.
>>
>> Big Data Analytics is greatly gaining grounds in all domains in industry
>> today and Flink is the solution. By reducing overheads and system
>> bottlenecks, the throughput of the companies will be improved and many more
>> people will to use and support the project.
>>
>> *ABOUT ME*
>>
>> I am an IT enthusiast and 3rd year Software Engineering student at the University
>> of Buea <http://ubuea.cm/>, Cameroon pursuing a Bachelor of Engineering
>> in Computer Engineering. I have been programming in Java for 2years+,
>> MySQL, PostGRES, web application development in PHP (Laravel and Yii
>> frameworks), 3 years experience with C programming language, Linux
>> System Administration and recently, Stream Processing. I'm currently in
>> my 2nd Semester of my 3rd year and will be on Internship at Orange
>> Cameroon <http://www.orange.cm/en/>, a mobile telecommunications company
>> by September 2015.
>>
>> I have contributed to https://github.com/ch3ck/sams where work on the
>> student attendance management system is still going on,
>> https://github.com/NetLogo/NetLogo and.
>>
>> Finally, this is my github account: https://github.com/bwepngong and
>> Google Plus: https://plus.google.com/+WepngongBenaiahNgeh
>>
>> I am finishing my B.Eng at the University of Buea in Cameroon in December
>> 2016.
>>
>> *Milestones*
>>
>> *30**th** March-2**7**th** April 2015*
>>
>>  *1. Understand how flink streaming works look into the streamgraph and
>> the stramingjobgraphbuilder and start doing simpler things with flink*
>>
>> *2. **Design and analysis of the entire system.*
>>
>> *3**. Ask questions in mailing lists for **clarifications.*
>>
>> *27**th** April – 26 June**(Mid term)*
>>
>> Implement
>>
>>    1.
>>
>>     OPERATOR REORDERING Means changing the order in which the operators
>>    appear in the stream graph to eliminate overheads.
>>    2.
>>
>>    Perform unit testing for this algorithm.
>>
>> *27**th** June – 13 August *
>>
>> *Implement*
>>
>>    1.
>>
>>    REDUNDANCY ELIMINATION: Eliminate redundant computations by analysing
>>    the streaming graph using the *RETE algorithm *and remove duplicate
>>    operators which are not necessary. When other operators depend on
>>    another, compute that operator once only and share between other operators.
>>
>>  2. Perform unit testing
>>
>> *13**th** August - 21 August*
>>
>>    1.
>>
>>    Integrate modules and do system testing
>>
>> *22nd August –* *28th August(final evaluation)*
>>
>> Polish testing and get the required code samples ready
>>
>> *29th August –8th November*
>>
>>    1.
>>
>>    More testing
>>    2.
>>
>>    Code documentation.
>>    3.
>>
>>    And debugging
>>
>>
>>
>>
>


-- 
Wepngong Ngeh Benaiah

"The similarities of sysadmins and drug dealers: both measure stuff in Ks,
and both have users."

Mime
View raw message