flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Unbalanced job scheduling
Date Tue, 17 Oct 2017 07:09:28 GMT
Hi Andrea,

have you looked into assigning slot sharing groups [1]?

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups

2017-10-16 18:01 GMT+02:00 AndreaKinn <kinn6aer@hotmail.it>:

> Hi all,
> I want to expose you my program flow.
>
> I have the following operators:
>
> kafka-source -> timestamp-extractor -> map -> keyBy -> window -> apply
->
> LEARN -> SELECT -> process -> cassandra-sink
>
> the LEARN and SELECT operators belong to an external library supported by
> flink. LEARN is a very heavy operation compared to the other operators.
>
> Unfortunately LEARN has a max parallelism of 1, so if I have a cluster of 2
> TM with 1 slot each and I set parallelism = 2 I will have one TM which
> performs a parallel instances of all the operators and the single instance
> of LEARN while the other one TM performs just the second parallel instances
> of all the operators (clearly there are no more instance of LEARN).
> That's ok and I have no problem with understanding it.
>
> *** The problem:
> Actually I have 2 identical flows like this because it matches a situation
> where I have two sensor streams so really I have 2 LEARN operators
> corresponding to two independent streams.
>
> By the way I noted that even in this case I have one TM which take a load
> of
> the parallel instances of all the operators AND the single instances of
> LEARN-1 and LEARN-2 while the other one TM performs just the second
> parallel
> instances of all the operators (no LEARN instances here).
>
> Since LEARN is an heavy operator this lead to a very unbalanced load on the
> cluster, so much that the first TM is killed during the execution (looking
> at the logs it probably happens because it has not enough memory, in fact
> the sink execution is very very slow, it seems like the LEARN is a
> bottleneck).
>
> Honestly I can't understand why Flink don't assign 1 LEARN operator to one
> TM and the other one LEARN to the other one TM.
> This won't let me to stress the cluster properly because I will have always
> one TM super busy and the other one quite "free" and unstressed.
>
> Bye,
> Andrea
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Mime
View raw message