flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niklas Semmler <nik...@inet.tu-berlin.de>
Subject How do network transmissions in Flink work?
Date Mon, 06 Jul 2015 18:36:51 GMT
Hello Flink Community,

I am working on a network scheduler and am currently reading Flink's 
code to figure out how the data exchange works. It would be great if you 
could help me with some of my issues and questions.

Basically I want to extract from flink the time when a data transmission 
between two machines starts (1), their connection details (2), how much 
data is involved (3) and when it ends (4).

So far I have understood that the scheduling of tasks is done via the 
scheduleOrUpdateConsumers JobManagerMessage. In the function of the same 
name in the class Execution I have been able to extract the IP/Port pair 
of both the producer and the consumer(s) use.

Furthermore I understand that in the context of a "blocking" data 
transmission Flink will first create a ResultPartition and store all the 
data in the form of Buffers before starting the transmission. So in 
principle I should be able to figure out what amount of data Flink will 
communicate by looking at the respective 
ResultSubpartition.totalNumberOfBytes, right?
However, in the process I would need to map each ResultSubpartition to a 
slot or deployed task, so that I can associate this amount of data with 
connection details of the sender and the receiver. Any hints on how to 
do that?

Now from what I see the same is not possible in a "pipelined" context, 
correct? Can anything be said about the data to be communicated?

Finally, I was unable to locate in the code and in the logs where a 
Task's state is changing from RUNNING to FINISHED. Could you give me a 

It would be great if you could share your insights on the problems above ;).

Best regards,

PhD Student / Research Assistant
INET, TU Berlin
Room 4.029
Marchstr 23
10587 Berlin
Tel: +49 30 314 78752

View raw message