flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camelia Elena Ciolac <came...@chalmers.se>
Subject Questions for a better understanding of the internals of data exchange between tasks
Date Tue, 19 Jan 2016 10:37:41 GMT

I list some questions gathered while reading documentation on Flink's internals and I am grateful
to receive your answers.

1) How is the JobManager involved in the communication between tasks running in task slots
on TaskManagers?

>From [1] it appears to me that, as part of the control flow for data exchange, every time
a task has a a result that is consumable, it notifies the JobManager
This is repeated in [2] "When producing an intermediate result, the producing task is responsible
to notify the JobManager about available data".
However, the result's size might as well be 1, in the case of data streams, so I ask this
question to understand better if for each data exchange the JobManager is involved.

2) In case my understanding of the aforementioned control flow is correct, then from the following
options, which is the data outcome that triggers a notification of the JobManager: a ResultPartition,
a ResultSubpartition or a Buffer?

3) Idem, then are the Akka actors involved in this notification of data availability for consumption?

4) How is orchestrated this co-existence of receiver-initiated data transfers (the consumer
requests the data partitions) with the push data transfers [2] between 2 tasks ?
>From [3] I retain the paragraph:

"The JobManager also acts as the input split assigner for data sources. It is responsible
for distributing the work across all TaskManager such that data locality is preserved where
possible. In order to dynamically balance the load, the Tasks request a new input split after
they have finished processing the old one. This request is realized by sending a RequestNextInputSplit
to the JobManager. The JobManager responds with a NextInputSplit message. If there are no
more input splits, then the input split contained in the message is null.

The Tasks are deployed lazily to the TaskManagers. This means that tasks which consume data
are only deployed after one of its producers has finished producing some data. Once the producer
has done so, it sends a ScheduleOrUpdateConsumers message to the JobManager. This messages
says that the consumer can now read the newly produced data. If the consuming task is not
yet running, it will be deployed to a TaskManager."

5) Can you please summarize how this data exchange occurs in Flink, based on an example?

Thank you!

Best regards,

[1] https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
[2] https://cwiki.apache.org/confluence/display/FLINK/Network+Stack%2C+Plugable+Data+Exchange
[3] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message