flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Sparks <jspa...@cray.com>
Subject scaling question
Date Fri, 19 Jun 2015 14:24:05 GMT

Sorry for the post again. I guess I'm not understanding this…

The question is how to scale up/increase the execution of a problem. What  I'm trying to do,
is get the best out of the available processors for a given node count and compare this against
spark, using KMeans.

For spark,  one method is to increase the executors and RDD partitions  - for Flink I can
increase the number of task slots (taskmanager.numberOfTaskSlots). My empirical evidence suggests
that just increasing the slots does not increase processing of the data. Is there something
I'm missing? Much like spark with re-partitioning your datasets, is there an equivalent option
for flink? What about the parallelism argument The referring document seems to be broken…

This seems to be a dead link: https://github.com/apache/flink/blob/master/docs/setup/%7B%7Bsite.baseurl%7D%7D/apis/programming_guide.html#parallel-execution

If I do increase the parallelism to be (taskManagers*slots) I hit the "Insufficient number
of network buffers…"

I have 16 nodes (64 HT cores), and have run TaskSlots from 1, 4, 8, 16  and still the execution
time is always around 5-6 minutes, using the default parallelism.

Jonathan (Bill) Sparks
Software Architecture
Cray Inc.

View raw message