Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAGr9p8Cxi6j87B1e_r_E5x3T3RS4K4m4b=AfwQtKD=BuOpWpHQ@mail.gmail.com>
References: 
 <CAO=2E=tfSkDtPP=F8P6dHppyHv7Q=o1DAyDEa_f4thY3_ALxXg@mail.gmail.com>
	<CAO=2E=s9hs_hqXaD-vmS5E_PZikA10hjasySwayzfmATcnac2w@mail.gmail.com>
	<CAGWx-_sFuj0J1J-5tDVPw9NAkvwN34m2gwU6Nu8nFbzVt-jpmA@mail.gmail.com>
	<CAGr9p8Cxi6j87B1e_r_E5x3T3RS4K4m4b=AfwQtKD=BuOpWpHQ@mail.gmail.com>
Date: Wed, 2 Sep 2015 08:56:35 -0400
Message-ID: 
 <CACVCA=eipJJzcKrQDs4n93+WXj7OTizYHYZuWv+9_FNzGyFuaQ@mail.gmail.com>
Subject: Re: Hardware requirements and learning resources
From: jay vyas <jayunit100.apache@gmail.com>
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a1147a1cc5b333d051ec3323a

--001a1147a1cc5b333d051ec3323a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

We're also working on a bigpetstore implementation of flink which will help
onboard spark/mapreduce folks.

I have prototypical code here that runs a simple job in memory,
contributions welcome,

right now there is a serialization error
https://github.com/bigpetstore/bigpetstore-flink .

On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger <rmetzger@apache.org> wrote:

> Hi Juan,
>
> I think the recommendations in the Spark guide are quite good, and are
> similar to what I would recommend for Flink as well.
> Depending on the workloads you are interested to run, you can certainly
> use Flink with less than 8 GB per machine. I think you can start Flink
> TaskManagers with 500 MB of heap space and they'll still be able to proce=
ss
> some GB of data.
>
> Everything above 2 GB is probably good enough for some initial
> experimentation (again depending on your workloads, network, disk speed
> etc.)
>
>
>
>
> On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzoumas@apache.org>
> wrote:
>
>> Hi Juan,
>>
>> Flink is quite nimble with hardware requirements; people have run it in
>> old-ish laptops and also the largest instances available in cloud
>> providers. I will let others chime in with more details.
>>
>> I am not aware of something along the lines of a cheatsheet that you
>> mention. If you actually try to do this, I would love to see it, and it
>> might be useful to others as well. Both use similar abstractions at the =
API
>> level (i.e., parallel collections), so if you stay true to the functiona=
l
>> paradigm and not try to "abuse" the system by exploiting knowledge of it=
s
>> internals things should be straightforward. These apply to the batch API=
s;
>> the streaming API in Flink follows a true streaming paradigm, where you =
get
>> an unbounded stream of records and operators on these streams.
>>
>> Funny that you ask about a video for the DataStream slides. There is a
>> Flink training happening as we speak, and a video is being recorded righ=
t
>> now :-) Hopefully it will be made available soon.
>>
>> Best,
>> Kostas
>>
>>
>> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodr=C3=ADguez Hortal=C3=A1 <
>> juan.rodriguez.hortala@gmail.com> wrote:
>>
>>> Answering to myself, I have found some nice training material at
>>> http://dataartisans.github.io/flink-training. There are even videos at
>>> youtube for some of the slides
>>>
>>>   - http://dataartisans.github.io/flink-training/overview/intro.html
>>>     https://www.youtube.com/watch?v=3DXgC6c4Wiqvs
>>>
>>>   -
>>> http://dataartisans.github.io/flink-training/dataSetBasics/intro.html
>>>     https://www.youtube.com/watch?v=3D0EARqW15dDk
>>>
>>> The third lecture
>>> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html
>>> more or less corresponds to https://www.youtube.com/watch?v=3D1yWKZ26NQ=
eU
>>> but not exactly, and there are more lessons at
>>> http://dataartisans.github.io/flink-training, for stream processing and
>>> the table API for which I haven't found a video. Does anyone have point=
ers
>>> to the missing videos?
>>>
>>> Greetings,
>>>
>>> Juan
>>>
>>> 2015-09-02 12:50 GMT+02:00 Juan Rodr=C3=ADguez Hortal=C3=A1 <
>>> juan.rodriguez.hortala@gmail.com>:
>>>
>>>> Hi list,
>>>>
>>>> I'm new to Flink, and I find this project very interesting. I have
>>>> experience with Apache Spark, and for I've seen so far I find that Fli=
nk
>>>> provides an API at a similar abstraction level but based on single rec=
ord
>>>> processing instead of batch processing. I've read in Quora that Flink
>>>> extends stream processing to batch processing, while Spark extends bat=
ch
>>>> processing to streaming. Therefore I find Flink specially attractive f=
or
>>>> low latency stream processing. Anyway, I would appreciate if someone c=
ould
>>>> give some indication about where I could find a list of hardware
>>>> requirements for the slave nodes in a Flink cluster. Something along t=
he
>>>> lines of
>>>> https://spark.apache.org/docs/latest/hardware-provisioning.html. Spark
>>>> is known for having quite high minimal memory requirements (8GB RAM an=
d 8
>>>> cores minimum), and I was wondering if it is also the case for Flink. =
Lower
>>>> memory requirements would be very interesting for building small Flink
>>>> clusters for educational purposes, or for small projects.
>>>>
>>>> Apart from that, I wonder if there is some blog post by the comunity
>>>> about transitioning from Spark to Flink. I think it could be interesti=
ng,
>>>> as there are some similarities in the APIs, but also deep differences =
in
>>>> the underlying approaches. I was thinking in something like Breeze's
>>>> cheatsheet comparing its matrix operatations with those available in M=
atlab
>>>> and Numpy
>>>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or
>>>> like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also,
>>>> any pointer to some online course, book or training for Flink besides =
the
>>>> official programming guides would be much appreciated
>>>>
>>>> Thanks in advance for help
>>>>
>>>> Greetings,
>>>>
>>>> Juan
>>>>
>>>>
>>>
>>
>


--=20
jay vyas

--001a1147a1cc5b333d051ec3323a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">We&#39;re also working on a bigpetstore implementation of =
flink which will help onboard spark/mapreduce folks.<div><br></div><div>I h=
ave prototypical code here that runs a simple job in memory, contributions =
welcome,</div><div><br></div><div>right now there is a serialization error=
=C2=A0<a href=3D"https://github.com/bigpetstore/bigpetstore-flink">https://=
github.com/bigpetstore/bigpetstore-flink</a> .</div></div><div class=3D"gma=
il_extra"><br><div class=3D"gmail_quote">On Wed, Sep 2, 2015 at 8:50 AM, Ro=
bert Metzger <span dir=3D"ltr">&lt;<a href=3D"mailto:rmetzger@apache.org" t=
arget=3D"_blank">rmetzger@apache.org</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr">Hi Juan,<div><br></div><div>I think the =
recommendations in the Spark guide are quite good, and are similar to what =
I would recommend for Flink as well.=C2=A0</div><div>Depending on the workl=
oads you are interested to run, you can certainly use Flink with less than =
8 GB per machine. I think you can start Flink TaskManagers with 500 MB of h=
eap space and they&#39;ll still be able to process some GB of data.</div><d=
iv><br></div><div>Everything above 2 GB is probably good enough for some in=
itial experimentation (again depending on your workloads, network, disk spe=
ed etc.)</div><div><br></div><div><br></div><div><br></div></div><div class=
=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><div class=3D"=
gmail_quote">On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <span dir=3D"lt=
r">&lt;<a href=3D"mailto:ktzoumas@apache.org" target=3D"_blank">ktzoumas@ap=
ache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr">Hi Juan,<div><br></div><div>Flink is quite nimble with hardware re=
quirements; people have run it in old-ish laptops and also the largest inst=
ances available in cloud providers. I will let others chime in with more de=
tails.</div><div><br></div><div>I am not aware of something along the lines=
 of a cheatsheet that you mention. If you actually try to do this, I would =
love to see it, and it might be useful to others as well. Both use similar =
abstractions at the API level (i.e., parallel collections), so if you stay =
true to the functional paradigm and not try to &quot;abuse&quot; the system=
 by exploiting knowledge of its internals things should be straightforward.=
 These apply to the batch APIs; the streaming API in Flink follows a true s=
treaming paradigm, where you get an unbounded stream of records and operato=
rs on these streams.</div><div><br></div><div>Funny that you ask about a vi=
deo for the DataStream slides. There is a Flink training happening as we sp=
eak, and a video is being recorded right now :-) Hopefully it will be made =
available soon.</div><div><br></div><div>Best,</div><div>Kostas</div><div><=
br></div></div><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail=
_quote">On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodr=C3=ADguez Hortal=C3=A1 <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:juan.rodriguez.hortala@gmail.com" tar=
get=3D"_blank">juan.rodriguez.hortala@gmail.com</a>&gt;</span> wrote:<br><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr">Answering to myself, I have f=
ound some nice training material at <a href=3D"http://dataartisans.github.i=
o/flink-training" target=3D"_blank">http://dataartisans.github.io/flink-tra=
ining</a>. There are even videos at youtube for some of the slides<div><div=
><br></div><div>=C2=A0 - <a href=3D"http://dataartisans.github.io/flink-tra=
ining/overview/intro.html" target=3D"_blank">http://dataartisans.github.io/=
flink-training/overview/intro.html</a></div><div>=C2=A0 =C2=A0 <a href=3D"h=
ttps://www.youtube.com/watch?v=3DXgC6c4Wiqvs" target=3D"_blank">https://www=
.youtube.com/watch?v=3DXgC6c4Wiqvs</a></div><div><br></div><div>=C2=A0 - <a=
 href=3D"http://dataartisans.github.io/flink-training/dataSetBasics/intro.h=
tml" target=3D"_blank">http://dataartisans.github.io/flink-training/dataSet=
Basics/intro.html</a></div><div>=C2=A0 =C2=A0 <a href=3D"https://www.youtub=
e.com/watch?v=3D0EARqW15dDk" target=3D"_blank">https://www.youtube.com/watc=
h?v=3D0EARqW15dDk</a></div><div><br></div><div>The third lecture <a href=3D=
"http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html" t=
arget=3D"_blank">http://dataartisans.github.io/flink-training/dataSetAdvanc=
ed/intro.html</a> more or less corresponds to <a href=3D"https://www.youtub=
e.com/watch?v=3D1yWKZ26NQeU" target=3D"_blank">https://www.youtube.com/watc=
h?v=3D1yWKZ26NQeU</a> but not exactly, and there are more lessons at <a hre=
f=3D"http://dataartisans.github.io/flink-training" target=3D"_blank">http:/=
/dataartisans.github.io/flink-training</a>, for stream processing and the t=
able API for which I haven&#39;t found a video.=C2=A0Does anyone have point=
ers to the missing videos?</div></div><div><br></div><div>Greetings,=C2=A0<=
/div><span><font color=3D"#888888"><div><br></div><div>Juan</div></font></s=
pan><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">201=
5-09-02 12:50 GMT+02:00 Juan Rodr=C3=ADguez Hortal=C3=A1 <span dir=3D"ltr">=
&lt;<a href=3D"mailto:juan.rodriguez.hortala@gmail.com" target=3D"_blank">j=
uan.rodriguez.hortala@gmail.com</a>&gt;</span>:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-le=
ft-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div di=
r=3D"ltr"><div>Hi list,=C2=A0</div><div><br></div><div>I&#39;m new to Flink=
, and I find this project very interesting. I have experience with Apache S=
park, and for I&#39;ve seen so far I find that Flink provides an API at a s=
imilar abstraction level but based on single record processing instead of b=
atch processing. I&#39;ve read in Quora that Flink extends stream processin=
g to batch processing, while Spark extends batch processing to streaming. T=
herefore I find Flink specially attractive for low latency stream processin=
g. Anyway, I would appreciate if someone could give some indication about w=
here I could find a list of hardware requirements for the slave nodes in a =
Flink cluster. Something along the lines of <a href=3D"https://spark.apache=
.org/docs/latest/hardware-provisioning.html" target=3D"_blank">https://spar=
k.apache.org/docs/latest/hardware-provisioning.html</a>. Spark is known for=
 having quite high minimal memory requirements (8GB RAM and 8 cores minimum=
), and I was wondering if it is also the case for Flink. Lower memory requi=
rements would be very interesting for building small Flink clusters for edu=
cational purposes, or for small projects.=C2=A0</div><div><br></div><div>Ap=
art from that, I wonder if there is some blog post by the comunity about tr=
ansitioning from Spark to Flink. I think it could be interesting, as there =
are some similarities in the APIs, but also deep differences in the underly=
ing approaches. I was thinking in something like Breeze&#39;s cheatsheet co=
mparing its matrix operatations with those available in Matlab and Numpy <a=
 href=3D"https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet=
" target=3D"_blank">https://github.com/scalanlp/breeze/wiki/Linear-Algebra-=
Cheat-Sheet</a>, or like <a href=3D"http://rosettacode.org/wiki/Factorial" =
target=3D"_blank">http://rosettacode.org/wiki/Factorial</a>. Just an idea a=
nyway. Also, any pointer to some online course, book or training for Flink =
besides the official programming guides would be much appreciated</div><div=
><br></div><div>Thanks in advance for help</div><div><br></div><div>Greetin=
gs,=C2=A0</div><span><font color=3D"#888888"><div><br></div><div>Juan</div>=
<div><br></div></font></span></div>
</blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature"><div dir=3D"ltr">jay vyas<br></div></div>
</div>

--001a1147a1cc5b333d051ec3323a--