flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anirvan Basu" <Anirvan.B...@alumni.INSEAD.edu>
Subject RE: [stratosphere-dev] Spark comparison
Date Sat, 30 Aug 2014 18:14:30 GMT
Stephan et Kostas,

I agree that the study is 1-yr old (so old in terms of dev timeframe for both these projects).
Seems that Spark has caught up good wind on its sails - Google, Facebook, Yahoo, IBM ... what
about you folks ? 
Are you also pitching these giants ? Let's assume that it is a fat-tail scenario.
Appears to me something similar to MongoDB in the NoSQL world (compared to Raven or Couch)
:-) Still need to figure hype or reality!

I tried Spark 1.0.2 this week:
- installation was fairly simple,
- the Python API was easy to do some beyond-hello world programmes,(did not check their R
package though)
- they also have a good streaming package,
- advantage was a good series of tutorials & webinars (helps to get rid of the fear of
"jumping into the water" for dummies like me)

Some pertinent questions: 
1. Would you be interested, if we did a neutral comparison of Flink and Spark, baselined to
Hadoop M-R framework ? I was also thinking of adding Summingbird - would like to know your
viewpoints there.
If we did publish, we would try to present it in some conference naturally! So think of the
perils as well ;-)
Actually, Robert had asked me a similar question - he put the idea in my head!

2. To what set of criteria would you want to compare Flink and Spark ?

3. Where do you stand for graph-based algos ? We are looking for a stable framework for graph-based
programmes -like balanced graph partitioning, evolution, ... - that way the Spark graphx appeared
very interesting. 
I know you have your own Spargel there - so how do you compare? Do you also do vertex-based
balanced partitioning (for e.g. JA-BE-JA k-way partitioning) ? Can you do edge-based partitioning
? I didn't come across any framework that realizes the latter.
Here attached is a simple paper presented by an Italian research group - they jumped on to
the Spark bandwagon!
Let me know your opinions (perhaps, you may know the group already)

Best !
Anirvan


-----Original Message-----
From: Stephan Ewen [mailto:sewen@apache.org] 
Sent: samedi 30 août 2014 18:26
To: dev@flink.incubator.apache.org
Subject: Re: [stratosphere-dev] Spark comparison

Hi!

I agree with Kostas, the code base of Stratosphere that was used was quite old.

The current Flnk version is different already, with the new APIs and different type handling.

Flink is taking a route that makes sure that the runtime is very robust, memory wise. We pay
currently a few CPU cycles overhead for that, but we have an effort gong to bring that down.

It would be interesting to rerun the experiments then...

Greetings,
Stephan



On Sat, Aug 30, 2014 at 9:16 AM, Kostas Tzoumas <ktzoumas@apache.org> wrote:

> Hi Anirvan,
>
> Yes, I am familiar with this thesis. I think that this comparison is 
> by now quite old (>1 year if I am not mistaken), and both systems have 
> evolved substantially since then.
>
> Kostas
>
>
> On Fri, Aug 29, 2014 at 7:01 PM, Robert Metzger <rmetzger@apache.org>
> wrote:
>
> > Forwarding the message to the new mailing list ...
> >
> > ---------- Forwarded message ----------
> > From: Nirvanesque <nirvanesque.paris@gmail.com>
> > Date: Fri, Aug 29, 2014 at 1:57 PM
> > Subject: Re: [stratosphere-dev] Spark comparison
> > To: stratosphere-dev@googlegroups.com
> >
> >
> > Ufuk and the Flink team,
> >
> > You and your team are familiar by now with this comparison (Master 
> > thesis of Ze Ni in the KTH Institute) 
> > http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
> >
> > I would like to know your viewpoints in this direction?
> >
> > Thanks in advance,
> > Anirvan
> >
> >
> >
> > On Tuesday, December 3, 2013 6:19:57 PM UTC+1, Ufuk Celebi wrote:
> >
> > > Hey Ankur,
> > >
> > > I like the idea of a comparison matrix. We tried to do something
> similar
> > > with Hadoop already (parts of it are on the front page of our 
> > > website), which we used for a local summit here. Comparing 
> > > Stratosphere to Spark
> in
> > > this way would be a natural extension to this. ;-)
> > >
> > > Internally, we ran some benchmarks against 0.7.3 (unfortunately 
> > > right before the 0.8 release). We didn't publish the results as 
> > > there are
> > certain
> > > aspects that make the comparison unfair (for example we have no 
> > > fault tolerance right now whereas Spark does). As soon as we 
> > > (re-)introduce
> > fault
> > > tolerance mechanisms, we will re-run the benchmarks.
> > >
> > > I can publish the code for the Stratosphere and Spark programs we
> looked
> > > at on GitHub. If I add Scala versions of the Stratosphere 
> > > programs,
> this
> > > will also go to your proposed direction of having a direct comparison.
> > >
> > > Is there any specific use case where you want to see numbers? Or 
> > > is it more like a general thing where you want to see how both 
> > > systems
> perform?
> > >
> > > Best,
> > >
> > > Ufuk
> > >
> > > On 03 Dec 2013, at 18:03, Ankur Chauhan <an...@malloc64.com> wrote:
> > >
> > > Hi all,
> > >
> > >
> > > Sitting at spark-summit 2013, I was interested in figuring out if
> anyone
> > > has done a feature comparison and or benchmarks against
> spark/storm/etc.
> > > This may also serve as a "compatibility matrix" and would help a 
> > > lot
> when
> > > people want to compare the two projects and help us understand 
> > > what are
> > the
> > > strengths and weakness of each project.
> > >
> > > -- Ankur
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "stratosphere-dev" group.
> > > To unsubscribe from this group and stop receiving emails from it, 
> > > send
> an
> > > email to stratosphere-d...@googlegroups.com.
> > >
> > > Visit this group at http://groups.google.com/group/stratosphere-dev.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > >
> > >  --
> > You received this message because you are subscribed to the Google 
> > Groups "stratosphere-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, 
> > send an email to stratosphere-dev+unsubscribe@googlegroups.com.
> > Visit this group at http://groups.google.com/group/stratosphere-dev.
> > For more options, visit https://groups.google.com/d/optout.
> >
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message