flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Hello, Pipelining Question
Date Tue, 16 Feb 2016 09:30:49 GMT
Yes, Flink is a pipelined system because it is able to shipped data over
the network while it is produced (pipelined network communication).
In constrast, Spark produces a result completely before it it is sent over
the network in a batch fashion.

However, Flink does also support batched data exchange similar to Spark.

Best, Fabian

2016-02-15 23:17 GMT+01:00 Philip Lee <philjjoon@gmail.com>:

> Hi,
> I found some interesting results from comparison with spark-sql and flink.
> just for your information, spark-sql uses Hive QL on spark machine.
> so as far as we know, when we run Flink job, the functions could be
> overlapped on *pipelining* like this picture.
> [image: Inline image 1]
> likewise, spark supports *pipelining* as I read PPT of Spark. The
> function could be overlapped as well. but it seems like there is some
> boundary.
> For example, in *Flink*, functions to read multiple inputs could be run
> together *with join function* like the above pic. but in *Spark*, to read
> multiple inputs can be together, but join function is seemingly
> *sepearted* to the reading functions. (you can see the starting time and
> duration, indicating join step is seperated)
> [image: Inline image 2]
> This is why Spark is a Batch processing in memory, wherease Flink is a
> Streaming processing in memory?
> Best,
> Phil

View raw message