flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: scaling question
Date Fri, 19 Jun 2015 14:31:28 GMT
Hi Bill,

You're right. Simply increasing the task manager slots doesn't do anything.
It is correct to set the parallelism to taskManagers*slots. Simply increase
the number of network buffers in the flink-conf.yaml, e.g. to 4096. In the
future, we will configure this setting dynamically.

Let us know if your runtime decreases :)


On Fri, Jun 19, 2015 at 4:24 PM, Bill Sparks <jsparks@cray.com> wrote:

>    Sorry for the post again. I guess I'm not understanding this…
>  The question is how to scale up/increase the execution of a problem.
> What  I'm trying to do, is get the best out of the available processors for
> a given node count and compare this against spark, using KMeans.
>  For spark,  one method is to increase the executors and RDD partitions
>  - for Flink I can increase the number of task slots
> (taskmanager.numberOfTaskSlots). My empirical evidence suggests that just
> increasing the slots does not increase processing of the data. Is there
> something I'm missing? Much like spark with re-partitioning your datasets,
> is there an equivalent option for flink? What about the parallelism
> argument The referring document seems to be broken…
>  This seems to be a dead link:
> https://github.com/apache/flink/blob/master/docs/setup/%7B%7Bsite.baseurl%7D%7D/apis/programming_guide.html#parallel-execution
>  If I do increase the parallelism to be (taskManagers*slots) I hit the
> "Insufficient number of network buffers…"
>  I have 16 nodes (64 HT cores), and have run TaskSlots from 1, 4, 8, 16
>  and still the execution time is always around 5-6 minutes, using the
> default parallelism.
>  Regards,
>     Bill
>  --
>  Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.

View raw message