flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: How do network transmissions in Flink work?
Date Mon, 13 Jul 2015 13:04:06 GMT
Hey Niklas,

there is also this Wiki entry: https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

On 09 Jul 2015, at 21:32, Niklas Semmler <nsemmler@inet.tu-berlin.de> wrote:

> 1. What does the number of ResultSubpartition instances in the ResultPartition correspond
to? Is one assigned to each consuming task? If so, how can I find for each ResultSubpartition
the corresponding Task, Slot or similar? If not how is decided which piece of the data is
routed to which consuming task?

Yes, for each consuming task. The wiring depends on the DistributionPattern and the parallelism
of the producing and consuming operator. You can look into the ExecutionGraph to see how the
wiring works (see connect* methods in ExecutionVertex class). Each subpartition corresponds
to an ExecutionEdge, which connects two ExecutionVertex instances, which is an abstraction
for tasks at runtime. This is essentially also where the routing is set.

Currently there is no way to get from the subpartition to the corresponding task. You would
have to look into the places where the instances are created and pass the reference. The RuntimeEnvironmenet
or Task class create these instances when a new task is submitted to a task manager.

> 2. What defines the number of Buffer instances per ResultSubpartition? Does one Buffer
correspond to exactly one serialized Record? Is a Record the single output of an operator,
are there multiple records per operator, or 
> does it differ depending on the operator?

The number of produced buffers depends on the data the corresponding operator/user function
produces. Each produced record is serialized into a buffer. It can span multiple buffers depending
on the record size.

There can be zero or more records per produced partition. (There will always be at least a
single buffer containing an end-of-partition event per partition though.)

> 3. Or are the Buffers defined in a completely different manner? In that case, could you
give me a pointer to understand how Buffer instances are used?

The buffers is a wrapper for a MemorySegment with a reference to a buffer pool, which owns
the buffer. Buffers are recycled after they have been consumed (e.g. after being written to
the TCP channel or by the user code).

Feel free to ask further questions or give feedback if you encounter anything you find weird.

– Ufuk
View raw message