flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: scaling question
Date Fri, 19 Jun 2015 14:44:23 GMT
PS: I've read your last email as 64 HT cores per machine. If it was in total over the 16 nodes,
you have to adjust my response accordingly. ;)

On 19 Jun 2015, at 16:42, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Bill,
> 
> no worry, questions are the purpose of this mailing list.
> 
> The number network buffers is a parameter that needs to be scaled with your setup. The
reason for that is Flink's pipelined data transfer, which requires a certain number of network
buffers to be available at the same time during processing.
> 
> There is an FAQ entry that explains how to set this parameter according to your setup:
> --> http://flink.apache.org/faq.html#i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this
> 
> The documentation for parallel execution can be found here:
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution
> 
> If you are working on the latest snapshot you can also configure Flink to use batched
data transfer instead of pipelined transfer. This is done via the ExecutionConfig.setExecutionMode(),
which you obtain by calling getConfig() on your ExecutionEnvironment.
> 
> Best, Fabian
> 
> 
> 2015-06-19 16:31 GMT+02:00 Maximilian Michels <mxm@apache.org>:
> Hi Bill,
> 
> You're right. Simply increasing the task manager slots doesn't do anything. It is correct
to set the parallelism to taskManagers*slots. Simply increase the number of network buffers
in the flink-conf.yaml, e.g. to 4096. In the future, we will configure this setting dynamically.
> 
> Let us know if your runtime decreases :)
> 
> Cheers,
> Max
> 
> On Fri, Jun 19, 2015 at 4:24 PM, Bill Sparks <jsparks@cray.com> wrote:
> 
> Sorry for the post again. I guess I'm not understanding this… 
> 
> The question is how to scale up/increase the execution of a problem. What  I'm trying
to do, is get the best out of the available processors for a given node count and compare
this against spark, using KMeans.
> 
> For spark,  one method is to increase the executors and RDD partitions  - for Flink I
can increase the number of task slots (taskmanager.numberOfTaskSlots). My empirical evidence
suggests that just increasing the slots does not increase processing of the data. Is there
something I'm missing? Much like spark with re-partitioning your datasets, is there an equivalent
option for flink? What about the parallelism argument The referring document seems to be broken…
> 
> This seems to be a dead link: https://github.com/apache/flink/blob/master/docs/setup/%7B%7Bsite.baseurl%7D%7D/apis/programming_guide.html#parallel-execution
> 
> If I do increase the parallelism to be (taskManagers*slots) I hit the "Insufficient number
of network buffers…" 
> 
> I have 16 nodes (64 HT cores), and have run TaskSlots from 1, 4, 8, 16  and still the
execution time is always around 5-6 minutes, using the default parallelism.
> 
> Regards,
>     Bill
> -- 
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
> 
> 


Mime
View raw message