flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarth Mailinglist <mailinglistsama...@gmail.com>
Subject Re: Flink and Spark
Date Thu, 25 Dec 2014 10:58:13 GMT
Thank you for your answer. I have a couple of follow up questions:
1. Does it support 'exactly once semantics' that Spark and Storm support?
2. (Related to 1) What happens when an error occurs during processing?
3. Is there a plan for adding Machine Learning support on top of Flink? Say
Alternative Least Squares, Basic Naive Bayes?
4. When you say Flink manages itself, does it mean I don't have to fiddle
with number of partitions (Spark), number of reduces / happers (Hadoop?) to
optimize performance? (In some cases this might be needed)
5. How far along is the Python API? I don't see the specs in the Website.

On Thu, Dec 25, 2014 at 4:31 AM, Márton Balassi <mbalassi@apache.org> wrote:

> Dear Samarth,
> Besides the discussions you have mentioned [1] I can recommend one of our
> recent presentations [2], especially the distinguishing Flink section (from
> slide 16).
> It is generally a difficult question as both the systems are rapidly
> evolving, so the answer can become outdated quite fast. However there are
> fundamental design features that are highly unlikely to change, for example
> Spark uses "true" batch processing, meaning that intermediate results are
> materialized (mostly in memory) as RDDs. Flink's engine is internally more
> like streaming, forwarding the results to the next operator asap. The
> latter can yield performance benefits for more complex jobs. Flink also
> gives you a query optimizer, spills gracefully to disk when the system runs
> out of memory and has some cool features around serialization. For
> performance numbers and some more insight please check out the presentation
> [2] and do not hesitate to post a follow-up mail here if you come across
> something unclear or extraordinary.
> [1]
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark
> [2] http://www.slideshare.net/GyulaFra/flink-apachecon
> Best,
> Marton
> On Tue, Dec 23, 2014 at 6:19 PM, Samarth Mailinglist <
> mailinglistsamarth@gmail.com> wrote:
>> Hey folks, I have a noob question.
>> I already looked up the archives and saw a couple of discussions
>> <http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark>
>> about Spark and Flink.
>> I am familiar with spark (the python API, esp MLLib), and I see many
>> similarities between Flink and Spark.
>> How does Flink distinguish itself from Spark?

View raw message