flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Why does Flink copy code from Spark?
Date Sun, 27 Dec 2015 16:51:00 GMT
Flink and Spark are open source projects which both have similar
problem domains. In some parts, their methodologies are similar, e.g.
because they build on Hadoop, use the Akka library, or implement
machine learning algorithms. In other parts, they are very different,
e.g. pipelined (Flink) vs batch (Spark) data transfer, real-time
(Flink) vs mini-batched (Spark) streaming, RDD-based memory execution
(Spark) vs out-of-core algorithms and graceful out-of-memory memory
handling (Flink).

Some of these differences may seem subtle but they are backed by
different philosophies and origins. Both, Flink and Spark, are complex
systems which have their pros and cons. Whether people use Flink or
Spark depends on their use cases.

As a Flink committer, it hurts a lot to hear such claims. I know how
much dedication and proficiency we have in the Flink community. If we
included any code which is subject to copyright, I would like to
resolve this. However, I'm not aware of any violation. If you make
such strong accusations, please provide a proper proof. Otherwise,
your message may be seen as an act of defamation or trolling.

Best regards,

On Sun, Dec 27, 2015 at 8:51 AM, Edward Lee <edward.da.lee@gmail.com> wrote:
> Lately I have been studying the source code to understand the internals.
> One thing that really surprised me was that a lot of code throughout Flink
> was very similar to Spark.
> Open source projects learn from each other and apply similar ideas.
> However, I am not talking about applying similar ideas. I am talking about
> literal copy of code. Many files seemed like they were created by
> copy-pasting code directly from Spark and then renaming the variable names
> to avoid looking identical.
> As I study more, I find "copy-pasted" code throughout Flink, from actors to
> machine learning to analyzer to code generation. A few files have
> attribution, but most of them do not.
> I thought Flink was more advanced. Why?

View raw message