Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9575717716 for ; Wed, 25 Mar 2015 00:47:51 +0000 (UTC) Received: (qmail 79736 invoked by uid 500); 25 Mar 2015 00:47:51 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 79668 invoked by uid 500); 25 Mar 2015 00:47:51 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 79658 invoked by uid 99); 25 Mar 2015 00:47:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 00:47:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bwepngong@gmail.com designates 209.85.223.181 as permitted sender) Received: from [209.85.223.181] (HELO mail-ie0-f181.google.com) (209.85.223.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 00:47:26 +0000 Received: by iecvj10 with SMTP id vj10so11311019iec.0 for ; Tue, 24 Mar 2015 17:47:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=W2iG1ZG0zJO4owQ5DDs7YxkW+yaAwD2T8tc2ywNNm6k=; b=Q+qh2mcIpXtdZTCcPLRygkXIfIqQNV7UFYrn/KkJh+jcNHYY3fCUzgo/QCScan1Kg2 gTmXeJwULF3DY/KuOTqF3TuD3DlZU2ZteKKuR70/8C+/bgV6fDFOvIL7TBUe2oDTBSr0 f1heK9WXh4owSvz/h2qpfLQ/9YdczgsMrTe+dmHR9VAXTsqJsmWtrdlUtPiTGPyuOffJ 54lALPN3o3NVbNknF1gLX2qBCmPacpaoAcFe14OTbDbQa+YScWRexQsu9RAC8gtfq5cB inKkaLqyxBKjyQfUdJUUlr+9RtCyRYex8V0EFhD6T2TVe3QHe0WFxmATPjee/DRA/jGj xaWA== MIME-Version: 1.0 X-Received: by 10.43.6.74 with SMTP id oj10mr29593771icb.92.1427244444209; Tue, 24 Mar 2015 17:47:24 -0700 (PDT) Received: by 10.107.136.215 with HTTP; Tue, 24 Mar 2015 17:47:24 -0700 (PDT) In-Reply-To: References: Date: Wed, 25 Mar 2015 01:47:24 +0100 Message-ID: Subject: Re: GSoC project proposal: Query optimisation layer for Flink Streaming From: Wepngong Benaiah To: "user@flink.apache.org" Content-Type: multipart/alternative; boundary=bcaec5101d6d1ad74e0512123e24 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5101d6d1ad74e0512123e24 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for your comments @rmetzger, @mbalassi. I will do necessary corrections and put it up again for review. On Tue, Mar 24, 2015 at 10:01 PM, Robert Metzger wrote: > Just a quick ping on this for the streaming folks: The deadline for the > proposal submissions is Friday, so the GSoC applicants need to get our > feedback asap. > The student asked me today in the #flink channel whether we can review > this proposal. > > > I have the following comments regarding the proposal: > - I don't exactly understand how you've chosen the dates for the > milestones. According to > https://www.google-melange.com/gsoc/events/google/gsoc2015 the coding > phase begins at 25 May and ends on 21 August. It seems that you are > suggesting to start with the implementation before the offical GSoC start > date. > I would suggest to align the milestones with the official GSoC timeline > (or at least justify in the proposal why you're deviating from that) > - Can you explain a bit more how you are planning to do operator > reordering and how the "rete algorithm" is working. Also some background = on > why you've chosen that algorithm would be helpful. > > > > On Sun, Mar 22, 2015 at 4:44 AM, Wepngong Benaiah > wrote: > >> Hello, >> I cam out with the following proposal which I believe needs alot of >> review. I will appreciate if you can help me make appropriate correction= s >> before the deadline for submission. >> Thanks @gyfora, @pariscarbone >> >> >> *GSoC project: Query optimisation layer for Flink Streaming >> * >> >> NAME: Wepngong Ngeh Benaiah >> >> EMAIL: bwepngong@gmail.com >> >> *SYNOPSIS* >> >> I would very much like to participate for GSOC2015 with Apache >> working with Flink streaming as >> my way of contributing to open-source. >> >> Flink streaming currently only supports a limited set of optimisations >> applied on the streaming programs such as *operator chaining*, and >> several optimisations for *windowing* *computations*. >> >> Also, there is currently no optimizer as a separate module on its own. >> Though *operator chaining *improves performance, alot more has to be >> done to further improve system performance. >> >> My project will be to implement a *Query Optimisation layer for Flink >> Streaming. *This is supposed to do statistical graph analysis and >> streaming graph optimization. This would bring major system performance >> improvements. >> >> *H**OW **WOULD **THE COMMUNITY **BENEFIT FROM THIS**?* >> >> Much of =E2=80=9Cbig data=E2=80=9D is received in real time, and is most= valuable at its >> time of arrival. For example, a social network may want to identify >> trending conversation topics within minutes, an ad provider may want to >> train a model of which users click a new ad, and a service operator may >> want to mine log files to detect failures within seconds. >> >> Big Data Analytics is greatly gaining grounds in all domains in industry >> today and Flink is the solution. By reducing overheads and system >> bottlenecks, the throughput of the companies will be improved and many m= ore >> people will to use and support the project. >> >> *ABOUT ME* >> >> I am an IT enthusiast and 3rd year Software Engineering student at the U= niversity >> of Buea , Cameroon pursuing a Bachelor of Engineering >> in Computer Engineering. I have been programming in Java for 2years+, >> MySQL, PostGRES, web application development in PHP (Laravel and Yii >> frameworks), 3 years experience with C programming language, Linux >> System Administration and recently, Stream Processing. I'm currently in >> my 2nd Semester of my 3rd year and will be on Internship at Orange >> Cameroon , a mobile telecommunications company >> by September 2015. >> >> I have contributed to https://github.com/ch3ck/sams where work on the >> student attendance management system is still going on, >> https://github.com/NetLogo/NetLogo and. >> >> Finally, this is my github account: https://github.com/bwepngong and >> Google Plus: https://plus.google.com/+WepngongBenaiahNgeh >> >> I am finishing my B.Eng at the University of Buea in Cameroon in Decembe= r >> 2016. >> >> *Milestones* >> >> *30**th** March-2**7**th** April 2015* >> >> *1. Understand how flink streaming works look into the streamgraph and >> the stramingjobgraphbuilder and start doing simpler things with flink* >> >> *2. **Design and analysis of the entire system.* >> >> *3**. Ask questions in mailing lists for **clarifications.* >> >> *27**th** April =E2=80=93 26 June**(Mid term)* >> >> Implement >> >> 1. >> >> OPERATOR REORDERING Means changing the order in which the operators >> appear in the stream graph to eliminate overheads. >> 2. >> >> Perform unit testing for this algorithm. >> >> *27**th** June =E2=80=93 13 August * >> >> *Implement* >> >> 1. >> >> REDUNDANCY ELIMINATION: Eliminate redundant computations by analysing >> the streaming graph using the *RETE algorithm *and remove duplicate >> operators which are not necessary. When other operators depend on >> another, compute that operator once only and share between other oper= ators. >> >> 2. Perform unit testing >> >> *13**th** August - 21 August* >> >> 1. >> >> Integrate modules and do system testing >> >> *22nd August =E2=80=93* *28th August(final evaluation)* >> >> Polish testing and get the required code samples ready >> >> *29th August =E2=80=938th November* >> >> 1. >> >> More testing >> 2. >> >> Code documentation. >> 3. >> >> And debugging >> >> >> >> > --=20 Wepngong Ngeh Benaiah "The similarities of sysadmins and drug dealers: both measure stuff in Ks, and both have users." --bcaec5101d6d1ad74e0512123e24 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for your comments @rmetzger, @mbalassi. I will do n= ecessary corrections and put it up again for review.

On Tue, Mar 24, 2015 at 10:01 PM, Ro= bert Metzger <rmetzger@apache.org> wrote:
Just a quick ping on = this for the streaming folks: The deadline for the proposal submissions is = Friday, so the GSoC applicants need to get our feedback asap.
The stude= nt asked me today in the #flink channel whether we can review this proposal= .


I have the following comments reg= arding the proposal:
- I don't exactly understand how you'= ;ve chosen the dates for the milestones. According to=C2=A0ht= tps://www.google-melange.com/gsoc/events/google/gsoc2015 the coding pha= se begins at 25 May and=C2=A0ends on 21 August. It seems that you are sugge= sting to start with the implementation before the offical GSoC start date.<= br>
I would suggest to align the milestones with the official GSo= C timeline (or at least justify in the proposal why you're deviating fr= om that)
- Can you explain a bit more how you are planning to do = operator reordering and how the "rete algorithm" is working. Also= some background on why you've chosen that algorithm would be helpful.<= /div>



On Sun, Mar 22, 2015 at 4:44 AM, Wepngong= Benaiah <bwepngong@gmail.com> wrote:
Hello,
I cam out with the following proposal which I believe needs alot of review= . I will appreciate if you can help me make appropriate corrections before = the deadline for submission.
Thanks @gyfora, @pariscarbone


=

GSoC project: Query optimisation layer for Flink Streaming=

NAM= E: Wepngong Ngeh Benaiah

EMAIL: bwepngong@gmail.co= m

= SYNOPSIS

I would very much like to participate for GSOC2015 with Apache working with Flink stre= aming as my way of contributing to open-source.

Flink streaming currently only supports a limited set of optimisations applied on the streaming programs such as operator chaining, and several optimisations for windowing computations.

Als= o, there is currently no optimizer as a separate module on its own. Though operator chaining improves performance, alot more has to be done to further improve system performance.

My project will be to implement a Query Optimisation layer for Flink Streaming. This is supposed to do statistical graph analysis and streaming graph optimization. This would bring major system performance improvements.

HOW WOULD THE COMMUNITY BENEFIT FROM THIS?

Much of =E2=80=9Cbig data=E2=80=9D is received in real time, and is most valuabl= e at its time of arrival. For example, a social network may want to identify trending conversation topics within minutes, an ad provider may want to train a model of which users click a new ad, and a service operator may want to mine log files to detect failures within seconds.

Big Data Analytics is greatly gaining grounds in all domains in industry today and Flink is the solution. By reducing overheads and system bottlenecks, the throughput of the companies will be improved and many more people will to use and support the project.

ABOUT ME

I am an IT enthusiast and 3rd year Software Engineering student at the University of Buea, Cameroon pursuing a Bachelor of Engineering in Computer Engineering. I have been programming in Java for 2years+, MySQL, PostGRES, web application development in PHP (Laravel and Yii framewor= ks), 3 years experience with C programming language, Linux System Administration and recently, Stream Processing. I'm currently in my 2nd Semester of my 3rd year and will be on Internship at Orange Cameroon, a mobile telecommunications company by September 2015.

I have contributed to https://github.com/ch3ck/sams where work on the student attendance management system is still going on, https:= //github.com/NetLogo/NetLogo and.

Finall= y, this is my github account: https://github.com/bwepngong and Google Plus: https://plus.google.com/+WepngongBenaiahNgeh

I am finishing my B.Eng at the University of Buea in Cameroon in December 2016.=

Milestones

30th March-27<= sup>th April 2015

1. Understand how flink streaming works look into the streamgraph and the stramingjobgraphbuilder and start doing simpler things with flink

2. Design and analysis of the entire system.

3. Ask questions in mailing lists for clarifications.

27th April =E2=80=93 26 June(Mid term)

Implement

  1. =C2=A0OPERATOR REORDERING Means changing the order in which the operators appear in the stream graph to eliminate overheads.

  2. Perform unit testing for this algorithm.

27th June =E2=80=93 13 August

Implement

  1. REDUNDANCY ELIMINATION: Eliminate redundant computations by analysing the streaming graph using the RETE algorithm and remove duplicate operators which are not necessary. <= font style=3D"font-size:10pt">When other operators depend on another, compute that operator once only and share between other operators.

2. Perform unit testing

13th August - 21 August

  1. Integrate modules and do system testing

= 22nd August =E2=80=93 28th August(final evaluation)

Polish testing and get the required code samples ready

= 29th August =E2=80=938th November

  1. More testing

  2. Code documentation.

  3. And debugging







--
Wepngo= ng Ngeh Benaiah

"The similarities of sysadmins and drug dealers= : both measure stuff in Ks, and both have users."

--bcaec5101d6d1ad74e0512123e24--