flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Number of consumers per IntermediateResult
Date Mon, 14 Aug 2017 09:57:34 GMT
Hey Luis,

this is correct, yes. Note that these are "only" limitations of the
implementation and there is no fundamental reason to do it like this.
The different characteristics of intermediate results allow us to make
trade-offs here as seen fit.

Furthermore, the type of intermediate result describes when the
consumers can start consuming results. In streaming jobs, all results
are pipelined (consumers consume after the partition has some data
available). In batch jobs, you will find both pipelined and blocking
results (consumers consume only after the partition has all data
available).

Did you also see this Wiki page here?

https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

– Ufuk

On Sun, Aug 13, 2017 at 6:59 PM, Luis Alves <lmtjalves@gmail.com> wrote:
> Hi,
>
> Can someone validate the following regarding the ExecutionGraph:
>
> Each IntermediateResult can only be consumed by a single ExecutionJobVertex, i.e. if
two ExecutionJobVertex consume the same tuples (same “stream") that is produced by the same
ExecutionJobVertex, then the producer will have two IntermediateResult, one per consumer.
>
> In other words: if an ExecutionJobVertex performs a map operation, and has two consumers
(different ExecutionJobVertex), the ExecutionJobVertex will produce two datasets/IntermediateResults
(both with the same “content”, but different consumers).
>
> Each ExecutionVertex will then have the same amount of IntermediateResultPartitions as
the number of ExecutionJobVertex that consume the datasets generated by the respective ExecutionJobVertex.
>
> Thus, at runtime:
>
> ResultPartition maps to an IntermediateResultPartition (as documented in the javadoc).
Thus, 3. is also valid for ResultPartitions.
>
> ResultSubPartition maps to an ExecutionEdge (since it contains the information on how
to send the partition to the actual consumer Task).
>
> Thanks,
>
> Luís Alves

Mime
View raw message