flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Walther <twal...@apache.org>
Subject Re: About nodes number on Flink
Date Fri, 23 Jun 2017 12:15:18 GMT
Hi Andrea,

the number of nodes usually depends on the work that you do within your 
Functions.

E.g. if you have a computation intensive machine learning library in a 
MapFunction and takes 10 seconds per element, it might make sense to 
paralellize this in order to increase your throughput. Or if you have to 
save state of several GBs per key which would not fit on one machine.

Flink does not only parallelize per node but also per "slot". If you 
start your application with a parallelism of 2 (and have not configured 
custom parallelisms per operator), you will have two pipelines that 
process elements (so two MapFunctions are running in parallel one in 
each pipeline). 2 slots are occupied in this case. There are operations 
(like keyBy) that break this pipeline and repartition your data.

If you want to run operators in separate slots you can start a new chain 
(see here: 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups)

If you set parallelism to 'N' but I have less than 'N' SLOTS available, 
you cannot execute the job.

I hope my explanation helps.

Regards,
Timo


Am 22.06.17 um 16:54 schrieb AndreaKinn:
> Hello,
> I'm developing a Flink toy-application on my local machine before to deploy
> the real one on a real cluster.
> Now I have to determine how many nodes I need to set the cluster.
>
> I already read these documents:
> jobs and scheduling
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html>
> programming model
> <https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/programming-model.html>
> parallelism
> <https://flink.apache.org/faq.html#what-is-the-parallelism-how-do-i-set-it>
>
> But I'm still a bit confused about how many nodes I have to consider to
> execute my application.
>
> For example if I have the following code (from the doc):
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n13927/Screen_Shot_2017-06-22_at_16.png>
>
> - This means that operations "on same line" are executed on same node? (It
> sounds a bit strange to me)
>
> Some confirms:
> - If the answer to previous question is yes and if I set parallelism to '1'
> I can establish how many nodes I need counting how many operations I have to
> perform ?
> - If I set parallelism to 'N' but I have less than 'N' nodes available Flink
> automatically scales the elaboration on available nodes?
>
> My throughput and data load is not relevant I think, it is not heavy.
>
>
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/About-nodes-number-on-Flink-tp13927.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.



Mime
View raw message