flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit100.apa...@gmail.com>
Subject Re: Hardware requirements and learning resources
Date Wed, 02 Sep 2015 13:01:08 GMT
Just running the main class is sufficient

> On Sep 2, 2015, at 8:59 AM, Robert Metzger <rmetzger@apache.org> wrote:
> Hey jay,
> How can I reproduce the error?
>> On Wed, Sep 2, 2015 at 2:56 PM, jay vyas <jayunit100.apache@gmail.com> wrote:
>> We're also working on a bigpetstore implementation of flink which will help onboard
spark/mapreduce folks.
>> I have prototypical code here that runs a simple job in memory, contributions welcome,
>> right now there is a serialization error https://github.com/bigpetstore/bigpetstore-flink
>>> On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger <rmetzger@apache.org> wrote:
>>> Hi Juan,
>>> I think the recommendations in the Spark guide are quite good, and are similar
to what I would recommend for Flink as well. 
>>> Depending on the workloads you are interested to run, you can certainly use Flink
with less than 8 GB per machine. I think you can start Flink TaskManagers with 500 MB of heap
space and they'll still be able to process some GB of data.
>>> Everything above 2 GB is probably good enough for some initial experimentation
(again depending on your workloads, network, disk speed etc.)
>>>> On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzoumas@apache.org>
>>>> Hi Juan,
>>>> Flink is quite nimble with hardware requirements; people have run it in old-ish
laptops and also the largest instances available in cloud providers. I will let others chime
in with more details.
>>>> I am not aware of something along the lines of a cheatsheet that you mention.
If you actually try to do this, I would love to see it, and it might be useful to others as
well. Both use similar abstractions at the API level (i.e., parallel collections), so if you
stay true to the functional paradigm and not try to "abuse" the system by exploiting knowledge
of its internals things should be straightforward. These apply to the batch APIs; the streaming
API in Flink follows a true streaming paradigm, where you get an unbounded stream of records
and operators on these streams.
>>>> Funny that you ask about a video for the DataStream slides. There is a Flink
training happening as we speak, and a video is being recorded right now :-) Hopefully it will
be made available soon.
>>>> Best,
>>>> Kostas
>>>>> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá <juan.rodriguez.hortala@gmail.com>
>>>>> Answering to myself, I have found some nice training material at http://dataartisans.github.io/flink-training.
There are even videos at youtube for some of the slides
>>>>>   - http://dataartisans.github.io/flink-training/overview/intro.html
>>>>>     https://www.youtube.com/watch?v=XgC6c4Wiqvs
>>>>>   - http://dataartisans.github.io/flink-training/dataSetBasics/intro.html
>>>>>     https://www.youtube.com/watch?v=0EARqW15dDk
>>>>> The third lecture http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html
more or less corresponds to https://www.youtube.com/watch?v=1yWKZ26NQeU but not exactly, and
there are more lessons at http://dataartisans.github.io/flink-training, for stream processing
and the table API for which I haven't found a video. Does anyone have pointers to the missing
>>>>> Greetings, 
>>>>> Juan
>>>>> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá <juan.rodriguez.hortala@gmail.com>:
>>>>>> Hi list, 
>>>>>> I'm new to Flink, and I find this project very interesting. I have
experience with Apache Spark, and for I've seen so far I find that Flink provides an API at
a similar abstraction level but based on single record processing instead of batch processing.
I've read in Quora that Flink extends stream processing to batch processing, while Spark extends
batch processing to streaming. Therefore I find Flink specially attractive for low latency
stream processing. Anyway, I would appreciate if someone could give some indication about
where I could find a list of hardware requirements for the slave nodes in a Flink cluster.
Something along the lines of https://spark.apache.org/docs/latest/hardware-provisioning.html.
Spark is known for having quite high minimal memory requirements (8GB RAM and 8 cores minimum),
and I was wondering if it is also the case for Flink. Lower memory requirements would be very
interesting for building small Flink clusters for educational purposes, or for small projects.

>>>>>> Apart from that, I wonder if there is some blog post by the comunity
about transitioning from Spark to Flink. I think it could be interesting, as there are some
similarities in the APIs, but also deep differences in the underlying approaches. I was thinking
in something like Breeze's cheatsheet comparing its matrix operatations with those available
in Matlab and Numpy https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or
like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also, any pointer to some
online course, book or training for Flink besides the official programming guides would be
much appreciated
>>>>>> Thanks in advance for help
>>>>>> Greetings, 
>>>>>> Juan
>> -- 
>> jay vyas

View raw message