flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@ververica.com>
Subject Re: Flink distributed runtime architucture
Date Mon, 25 Nov 2019 13:55:18 GMT
Hi,

I’m glad to hear that you are interested in Flink! :)

>  In the picture, keyBy window and apply operators share the same circle. Is is because
these operators are chaining together? 

It’s not as much about chaining, as the chain of DataStream API invocations `someStream.keyBy(…).window(…).apply(…)`
creates a single logical operations - each one of them on it’s own doesn’t make sense,
but together they define how a single `WindowOperator` should behave (`keyBy` additionally
defines how records should be shuffle).

Chaining happens _usually_ between “shuffling” boundaries. So for example:

someStream
	.map(…) // first chain …
	.filter(…) // … still first chain
	.keyBy(…) // operator chain boundary
	.window(…).apply(…) // beginning of 2nd chain
	.map(…) // 2nd chain
	.filter(…) // still 2nd chain …
	.keyBy(…) // operator chain boundary
	(…)

>  the repartition process happens both inside TaskManager and between TaskManager. Inside
TaskManger, the data transmit overload maybe just in memory.

Yes, exactly. Data transfer happens in-memory, however records are still being serialised
and deserialised bot “local input channels” (that’s how we call communication between
operators inside a single TaskManager).

> Between TaskManager, the data transmit overload may inter-process or inter-container,
depending on how I deploy the Flink cluster. Is my understanding right? 

Yes, between TaskManagers network sockets are always used, regardless if they are happening
on one physical machine (localhost) or not.

> These details may highly related to Actor model? As I have little knowledge of Actor
model.

I’m not sure if I fully understand your questions. Flink is not using Actor model for the
data pipelines.

I hope that helps :)

Piotrek

> On 24 Nov 2019, at 07:35, Lu Weizheng <luweizheng36@hotmail.com> wrote:
> 
> Hi all,
> 
> I have been paying attention on Flink for about half a year and have read official documents
several times. I have already got a comprehensive understanding of Flink distributed runtime
architecture, but still have some questions that need to be clarify.
> 
> <PastedGraphic-2.png>
> 
> On Flink documents website, this picture shows the dataflow model of Flink. In the picture,
keyBy window and apply operators share the same circle. Is is because these operators are
chaining together? 
> 
> <PastedGraphic-3.png>
> 
> In the parallelized view, data stream is partition into multiple partitions. Each partition
is a subset of source data. Repartition happens when we use keyBy operator. If these tasks
share task slots and run like picture above, the repartition process happens both inside TaskManager
and between TaskManager. Inside TaskManger, the data transmit overload maybe just in memory.
Between TaskManager, the data transmit overload may inter-process or inter-container, depending
on how I deploy the Flink cluster. Is my understanding right? These details may highly related
to Actor model? As I have little knowledge of Actor model.
> 
> This is my first time to use Flink maillist. Thank you so much if anyone can explain
it.
> 
> Weizheng


Mime
View raw message