Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7E0617201 for ; Wed, 2 Sep 2015 12:56:51 +0000 (UTC) Received: (qmail 30469 invoked by uid 500); 2 Sep 2015 12:56:48 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 30399 invoked by uid 500); 2 Sep 2015 12:56:48 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 30389 invoked by uid 99); 2 Sep 2015 12:56:48 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 12:56:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3AA0FF0D8C for ; Wed, 2 Sep 2015 12:56:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ywFLQpmsDWt3 for ; Wed, 2 Sep 2015 12:56:37 +0000 (UTC) Received: from mail-qk0-f175.google.com (mail-qk0-f175.google.com [209.85.220.175]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id D856524C10 for ; Wed, 2 Sep 2015 12:56:36 +0000 (UTC) Received: by qkcj187 with SMTP id j187so4486224qkc.2 for ; Wed, 02 Sep 2015 05:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=l5bShxUt0J4B9P/+8Q24tdFw+8tDoU6TSShuF6axNVY=; b=V03i6caBk2Nde3HbDvIcZRJvV/RbfAFBZWam+/jsD2DWOKRRtZ6dP6CE+FHoGg41/E VEOKmcpSb+15Q/O2WP7RMcRBfREA6FA04QbfHh0zjr00/kALl5rk+vu1AOxvgAd18MGq r4wuc+UqltQb1NnWF5pZlDizMy3dsEEupR2/HIrKHPF+nU9e27hrGEyGnCGhJOBEukqo Md9MI7W+l5NhYib2KaKRov9tJpg52GGgxBjg7cX24fRnYmzdcjOiSo1yJHOCufp5/TA2 BzAyy1JYr4gt/CCAKkHwIFQ18qO/O+LUJ1GRy89xtq/dg3B+VHN9Cg4GjuhGe/vU8e1z FnZw== MIME-Version: 1.0 X-Received: by 10.55.217.218 with SMTP id q87mr27626937qkl.56.1441198595845; Wed, 02 Sep 2015 05:56:35 -0700 (PDT) Received: by 10.55.158.76 with HTTP; Wed, 2 Sep 2015 05:56:35 -0700 (PDT) In-Reply-To: References: Date: Wed, 2 Sep 2015 08:56:35 -0400 Message-ID: Subject: Re: Hardware requirements and learning resources From: jay vyas To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a1147a1cc5b333d051ec3323a --001a1147a1cc5b333d051ec3323a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We're also working on a bigpetstore implementation of flink which will help onboard spark/mapreduce folks. I have prototypical code here that runs a simple job in memory, contributions welcome, right now there is a serialization error https://github.com/bigpetstore/bigpetstore-flink . On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger wrote: > Hi Juan, > > I think the recommendations in the Spark guide are quite good, and are > similar to what I would recommend for Flink as well. > Depending on the workloads you are interested to run, you can certainly > use Flink with less than 8 GB per machine. I think you can start Flink > TaskManagers with 500 MB of heap space and they'll still be able to proce= ss > some GB of data. > > Everything above 2 GB is probably good enough for some initial > experimentation (again depending on your workloads, network, disk speed > etc.) > > > > > On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas > wrote: > >> Hi Juan, >> >> Flink is quite nimble with hardware requirements; people have run it in >> old-ish laptops and also the largest instances available in cloud >> providers. I will let others chime in with more details. >> >> I am not aware of something along the lines of a cheatsheet that you >> mention. If you actually try to do this, I would love to see it, and it >> might be useful to others as well. Both use similar abstractions at the = API >> level (i.e., parallel collections), so if you stay true to the functiona= l >> paradigm and not try to "abuse" the system by exploiting knowledge of it= s >> internals things should be straightforward. These apply to the batch API= s; >> the streaming API in Flink follows a true streaming paradigm, where you = get >> an unbounded stream of records and operators on these streams. >> >> Funny that you ask about a video for the DataStream slides. There is a >> Flink training happening as we speak, and a video is being recorded righ= t >> now :-) Hopefully it will be made available soon. >> >> Best, >> Kostas >> >> >> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodr=C3=ADguez Hortal=C3=A1 < >> juan.rodriguez.hortala@gmail.com> wrote: >> >>> Answering to myself, I have found some nice training material at >>> http://dataartisans.github.io/flink-training. There are even videos at >>> youtube for some of the slides >>> >>> - http://dataartisans.github.io/flink-training/overview/intro.html >>> https://www.youtube.com/watch?v=3DXgC6c4Wiqvs >>> >>> - >>> http://dataartisans.github.io/flink-training/dataSetBasics/intro.html >>> https://www.youtube.com/watch?v=3D0EARqW15dDk >>> >>> The third lecture >>> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html >>> more or less corresponds to https://www.youtube.com/watch?v=3D1yWKZ26NQ= eU >>> but not exactly, and there are more lessons at >>> http://dataartisans.github.io/flink-training, for stream processing and >>> the table API for which I haven't found a video. Does anyone have point= ers >>> to the missing videos? >>> >>> Greetings, >>> >>> Juan >>> >>> 2015-09-02 12:50 GMT+02:00 Juan Rodr=C3=ADguez Hortal=C3=A1 < >>> juan.rodriguez.hortala@gmail.com>: >>> >>>> Hi list, >>>> >>>> I'm new to Flink, and I find this project very interesting. I have >>>> experience with Apache Spark, and for I've seen so far I find that Fli= nk >>>> provides an API at a similar abstraction level but based on single rec= ord >>>> processing instead of batch processing. I've read in Quora that Flink >>>> extends stream processing to batch processing, while Spark extends bat= ch >>>> processing to streaming. Therefore I find Flink specially attractive f= or >>>> low latency stream processing. Anyway, I would appreciate if someone c= ould >>>> give some indication about where I could find a list of hardware >>>> requirements for the slave nodes in a Flink cluster. Something along t= he >>>> lines of >>>> https://spark.apache.org/docs/latest/hardware-provisioning.html. Spark >>>> is known for having quite high minimal memory requirements (8GB RAM an= d 8 >>>> cores minimum), and I was wondering if it is also the case for Flink. = Lower >>>> memory requirements would be very interesting for building small Flink >>>> clusters for educational purposes, or for small projects. >>>> >>>> Apart from that, I wonder if there is some blog post by the comunity >>>> about transitioning from Spark to Flink. I think it could be interesti= ng, >>>> as there are some similarities in the APIs, but also deep differences = in >>>> the underlying approaches. I was thinking in something like Breeze's >>>> cheatsheet comparing its matrix operatations with those available in M= atlab >>>> and Numpy >>>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or >>>> like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also, >>>> any pointer to some online course, book or training for Flink besides = the >>>> official programming guides would be much appreciated >>>> >>>> Thanks in advance for help >>>> >>>> Greetings, >>>> >>>> Juan >>>> >>>> >>> >> > --=20 jay vyas --001a1147a1cc5b333d051ec3323a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
We're also working on a bigpetstore implementation of = flink which will help onboard spark/mapreduce folks.

I h= ave prototypical code here that runs a simple job in memory, contributions = welcome,

right now there is a serialization error= =C2=A0https://= github.com/bigpetstore/bigpetstore-flink .

On Wed, Sep 2, 2015 at 8:50 AM, Ro= bert Metzger <rmetzger@apache.org> wrote:
Hi Juan,

I think the = recommendations in the Spark guide are quite good, and are similar to what = I would recommend for Flink as well.=C2=A0
Depending on the workl= oads you are interested to run, you can certainly use Flink with less than = 8 GB per machine. I think you can start Flink TaskManagers with 500 MB of h= eap space and they'll still be able to process some GB of data.

Everything above 2 GB is probably good enough for some in= itial experimentation (again depending on your workloads, network, disk spe= ed etc.)




On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzoumas@ap= ache.org> wrote:
Hi Juan,

Flink is quite nimble with hardware re= quirements; people have run it in old-ish laptops and also the largest inst= ances available in cloud providers. I will let others chime in with more de= tails.

I am not aware of something along the lines= of a cheatsheet that you mention. If you actually try to do this, I would = love to see it, and it might be useful to others as well. Both use similar = abstractions at the API level (i.e., parallel collections), so if you stay = true to the functional paradigm and not try to "abuse" the system= by exploiting knowledge of its internals things should be straightforward.= These apply to the batch APIs; the streaming API in Flink follows a true s= treaming paradigm, where you get an unbounded stream of records and operato= rs on these streams.

Funny that you ask about a vi= deo for the DataStream slides. There is a Flink training happening as we sp= eak, and a video is being recorded right now :-) Hopefully it will be made = available soon.

Best,
Kostas
<= br>

On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodr=C3=ADguez Hortal=C3=A1 <juan.rodriguez.hortala@gmail.com> wrote:
Answering to myself, I have f= ound some nice training material at http://dataartisans.github.io/flink-tra= ining. There are even videos at youtube for some of the slides


=C2=A0 - http://dataartisans.github.io/flink-training/dataSet= Basics/intro.html

The third lecture http://dataartisans.github.io/flink-training/dataSetAdvanc= ed/intro.html more or less corresponds to https://www.youtube.com/watc= h?v=3D1yWKZ26NQeU but not exactly, and there are more lessons at http:/= /dataartisans.github.io/flink-training, for stream processing and the t= able API for which I haven't found a video.=C2=A0Does anyone have point= ers to the missing videos?

Greetings,=C2=A0<= /div>

Juan

201= 5-09-02 12:50 GMT+02:00 Juan Rodr=C3=ADguez Hortal=C3=A1 = <j= uan.rodriguez.hortala@gmail.com>:
Hi list,=C2=A0

I'm new to Flink= , and I find this project very interesting. I have experience with Apache S= park, and for I've seen so far I find that Flink provides an API at a s= imilar abstraction level but based on single record processing instead of b= atch processing. I've read in Quora that Flink extends stream processin= g to batch processing, while Spark extends batch processing to streaming. T= herefore I find Flink specially attractive for low latency stream processin= g. Anyway, I would appreciate if someone could give some indication about w= here I could find a list of hardware requirements for the slave nodes in a = Flink cluster. Something along the lines of https://spar= k.apache.org/docs/latest/hardware-provisioning.html. Spark is known for= having quite high minimal memory requirements (8GB RAM and 8 cores minimum= ), and I was wondering if it is also the case for Flink. Lower memory requi= rements would be very interesting for building small Flink clusters for edu= cational purposes, or for small projects.=C2=A0

Ap= art from that, I wonder if there is some blog post by the comunity about tr= ansitioning from Spark to Flink. I think it could be interesting, as there = are some similarities in the APIs, but also deep differences in the underly= ing approaches. I was thinking in something like Breeze's cheatsheet co= mparing its matrix operatations with those available in Matlab and Numpy https://github.com/scalanlp/breeze/wiki/Linear-Algebra-= Cheat-Sheet, or like http://rosettacode.org/wiki/Factorial. Just an idea a= nyway. Also, any pointer to some online course, book or training for Flink = besides the official programming guides would be much appreciated

Thanks in advance for help

Greetin= gs,=C2=A0

Juan
=







--
=
jay vyas
--001a1147a1cc5b333d051ec3323a--