Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4217A17FBE for ; Wed, 2 Sep 2015 18:07:26 +0000 (UTC) Received: (qmail 59803 invoked by uid 500); 2 Sep 2015 18:07:26 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 59724 invoked by uid 500); 2 Sep 2015 18:07:26 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 59707 invoked by uid 99); 2 Sep 2015 18:07:26 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 18:07:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9D3BEF0EE7 for ; Wed, 2 Sep 2015 18:07:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id RbwHXEGBLjsJ for ; Wed, 2 Sep 2015 18:07:15 +0000 (UTC) Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id EF18642B27 for ; Wed, 2 Sep 2015 18:07:14 +0000 (UTC) Received: by lbcao8 with SMTP id ao8so10951657lbc.3 for ; Wed, 02 Sep 2015 11:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+mGnYibya4y806OMyDKMKRW3dLOwwZ5GXolj8zRGKEM=; b=QHuIxjNJKz5V8QfN4wqHdGYhr4VzJ6geV8F8kVsuQ4Tc0+fdI4BeIFPayJbSjR2kFG +FNaWaugnijGLmcaMMtuhsXcHWnNOlt1ComulmCpq25vxw+1hK4S1zeiIfedhOYRWg/Y AqOZTR4zN9jRPiMko5haNSbN1+CllNMR2CkXWg/9MIqzSSXENFww8i2sS0qbjbhVFHRe rgDlP5rGuKOv0mFgVtxInC0IKitfIBbLdzQYUouqftD84XR69x26CYh5K2G+/WvE+kJM CBAu0NVUCz1EsDuogyRGZPzq4TwXCGn06Hi0yX5gBmdskA9qZaxPSgQJrxCT96Ib6ztv /gdQ== MIME-Version: 1.0 X-Received: by 10.112.137.164 with SMTP id qj4mr16615791lbb.105.1441217233851; Wed, 02 Sep 2015 11:07:13 -0700 (PDT) Received: by 10.112.170.132 with HTTP; Wed, 2 Sep 2015 11:07:13 -0700 (PDT) In-Reply-To: References: Date: Wed, 2 Sep 2015 20:07:13 +0200 Message-ID: Subject: Re: Hardware requirements and learning resources From: =?UTF-8?B?SnVhbiBSb2Ryw61ndWV6IEhvcnRhbMOh?= To: user@flink.apache.org Content-Type: multipart/alternative; boundary=089e0115fd1c449d92051ec78957 --089e0115fd1c449d92051ec78957 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Kostas, Thanks a lot for your answer. It's nice to know there are more training videos on their way, they will be on my watch list. I guess you'll be using the data Artisans channel for the new videos too. Greetings, Juan 2015-09-02 14:30 GMT+02:00 Kostas Tzoumas : > Hi Juan, > > Flink is quite nimble with hardware requirements; people have run it in > old-ish laptops and also the largest instances available in cloud > providers. I will let others chime in with more details. > > I am not aware of something along the lines of a cheatsheet that you > mention. If you actually try to do this, I would love to see it, and it > might be useful to others as well. Both use similar abstractions at the A= PI > level (i.e., parallel collections), so if you stay true to the functional > paradigm and not try to "abuse" the system by exploiting knowledge of its > internals things should be straightforward. These apply to the batch APIs= ; > the streaming API in Flink follows a true streaming paradigm, where you g= et > an unbounded stream of records and operators on these streams. > > Funny that you ask about a video for the DataStream slides. There is a > Flink training happening as we speak, and a video is being recorded right > now :-) Hopefully it will be made available soon. > > Best, > Kostas > > > On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodr=C3=ADguez Hortal=C3=A1 < > juan.rodriguez.hortala@gmail.com> wrote: > >> Answering to myself, I have found some nice training material at >> http://dataartisans.github.io/flink-training. There are even videos at >> youtube for some of the slides >> >> - http://dataartisans.github.io/flink-training/overview/intro.html >> https://www.youtube.com/watch?v=3DXgC6c4Wiqvs >> >> - http://dataartisans.github.io/flink-training/dataSetBasics/intro.htm= l >> https://www.youtube.com/watch?v=3D0EARqW15dDk >> >> The third lecture >> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html >> more or less corresponds to https://www.youtube.com/watch?v=3D1yWKZ26NQe= U >> but not exactly, and there are more lessons at >> http://dataartisans.github.io/flink-training, for stream processing and >> the table API for which I haven't found a video. Does anyone have pointe= rs >> to the missing videos? >> >> Greetings, >> >> Juan >> >> 2015-09-02 12:50 GMT+02:00 Juan Rodr=C3=ADguez Hortal=C3=A1 < >> juan.rodriguez.hortala@gmail.com>: >> >>> Hi list, >>> >>> I'm new to Flink, and I find this project very interesting. I have >>> experience with Apache Spark, and for I've seen so far I find that Flin= k >>> provides an API at a similar abstraction level but based on single reco= rd >>> processing instead of batch processing. I've read in Quora that Flink >>> extends stream processing to batch processing, while Spark extends batc= h >>> processing to streaming. Therefore I find Flink specially attractive fo= r >>> low latency stream processing. Anyway, I would appreciate if someone co= uld >>> give some indication about where I could find a list of hardware >>> requirements for the slave nodes in a Flink cluster. Something along th= e >>> lines of https://spark.apache.org/docs/latest/hardware-provisioning.htm= l. >>> Spark is known for having quite high minimal memory requirements (8GB R= AM >>> and 8 cores minimum), and I was wondering if it is also the case for Fl= ink. >>> Lower memory requirements would be very interesting for building small >>> Flink clusters for educational purposes, or for small projects. >>> >>> Apart from that, I wonder if there is some blog post by the comunity >>> about transitioning from Spark to Flink. I think it could be interestin= g, >>> as there are some similarities in the APIs, but also deep differences i= n >>> the underlying approaches. I was thinking in something like Breeze's >>> cheatsheet comparing its matrix operatations with those available in Ma= tlab >>> and Numpy >>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or >>> like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also, >>> any pointer to some online course, book or training for Flink besides t= he >>> official programming guides would be much appreciated >>> >>> Thanks in advance for help >>> >>> Greetings, >>> >>> Juan >>> >>> >> > --089e0115fd1c449d92051ec78957 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Kostas,=C2=A0

Thanks a lot for your answer= . It's nice to know there are more training videos on their way, they w= ill be on my watch list. I guess you'll be using the data Artisans chan= nel for the new videos too.=C2=A0

Greetings,= =C2=A0

Juan=C2=A0


2015-09-02 14:30 GMT+02= :00 Kostas Tzoumas <ktzoumas@apache.org>:
Hi Juan,

Flink is qui= te nimble with hardware requirements; people have run it in old-ish laptops= and also the largest instances available in cloud providers. I will let ot= hers chime in with more details.

I am not aware of= something along the lines of a cheatsheet that you mention. If you actuall= y try to do this, I would love to see it, and it might be useful to others = as well. Both use similar abstractions at the API level (i.e., parallel col= lections), so if you stay true to the functional paradigm and not try to &q= uot;abuse" the system by exploiting knowledge of its internals things = should be straightforward. These apply to the batch APIs; the streaming API= in Flink follows a true streaming paradigm, where you get an unbounded str= eam of records and operators on these streams.

Fun= ny that you ask about a video for the DataStream slides. There is a Flink t= raining happening as we speak, and a video is being recorded right now :-) = Hopefully it will be made available soon.

Best,
Kostas


On Wed, S= ep 2, 2015 at 1:13 PM, Juan Rodr=C3=ADguez Hortal=C3=A1 &= lt;ju= an.rodriguez.hortala@gmail.com> wrote:
Answering to myself, I have found some nice = training material at http://dataartisans.github.io/flink-training. Ther= e are even videos at youtube for some of the slides


The third lecture http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html= more or less corresponds to https://www.youtube.com/watch?v=3D1yWKZ26= NQeU but not exactly, and there are more lessons at http://dataartisans= .github.io/flink-training, for stream processing and the table API for = which I haven't found a video.=C2=A0Does anyone have pointers to the mi= ssing videos?

Greetings,=C2=A0

Juan

2015-09-02 12:50= GMT+02:00 Juan Rodr=C3=ADguez Hortal=C3=A1 <juan.rodrigu= ez.hortala@gmail.com>:
<= div>Hi list,=C2=A0

I'm new to Flink, and I fin= d this project very interesting. I have experience with Apache Spark, and f= or I've seen so far I find that Flink provides an API at a similar abst= raction level but based on single record processing instead of batch proces= sing. I've read in Quora that Flink extends stream processing to batch = processing, while Spark extends batch processing to streaming. Therefore I = find Flink specially attractive for low latency stream processing. Anyway, = I would appreciate if someone could give some indication about where I coul= d find a list of hardware requirements for the slave nodes in a Flink clust= er. Something along the lines of https://spark.apache.or= g/docs/latest/hardware-provisioning.html. Spark is known for having qui= te high minimal memory requirements (8GB RAM and 8 cores minimum), and I wa= s wondering if it is also the case for Flink. Lower memory requirements wou= ld be very interesting for building small Flink clusters for educational pu= rposes, or for small projects.=C2=A0

Apart from th= at, I wonder if there is some blog post by the comunity about transitioning= from Spark to Flink. I think it could be interesting, as there are some si= milarities in the APIs, but also deep differences in the underlying approac= hes. I was thinking in something like Breeze's cheatsheet comparing its= matrix operatations with those available in Matlab and Numpy https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet= , or like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also= , any pointer to some online course, book or training for Flink besides the= official programming guides would be much appreciated

=
Thanks in advance for help

Greetings,=C2=A0

Juan




--089e0115fd1c449d92051ec78957--