flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: flink performance
Date Mon, 08 Sep 2014 15:13:25 GMT
Ok.

My work is available here:
https://github.com/aljoscha/incubator-flink/tree/scala-rework

Please look at the WordCount and KMeans example to see how the API has
changed but basically only the way you create Data Sources is changed.

I'm looking forward to your feedback. :D

On Mon, Sep 8, 2014 at 4:22 PM, Norman Spangenberg
<wir12kqe@studserv.uni-leipzig.de> wrote:
> I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP to
> optimize flink.
> @Aljoscha: it would be great to try out the new Scala-API for flink. I wrote
> already some other apps in scala, so I doesn't have to rewrite them.
>
> Am 08.09.2014 16:13, schrieb Robert Metzger:
>
>> There is probably a little typo in Aljoscha's answer. The
>> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine)
>> The parallelization.degree.default is correct.
>>
>> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <aljoscha@apache.org>
>> wrote:
>>
>>> Hi Norman,
>>> I saw you were running our Scala Examples. Unfortunately those do not
>>> run as well as our Java examples right now. The Scala API was a bit of
>>> a prototype that has some issues with efficiency. For now, you could
>>> maybe try running our Java examples.
>>>
>>> For your cluster, good configuration values would be numberOfTaskSlots
>>> = 4 (number of CPU cores) and parallelization.degree.default = 32
>>> (number of nodes X number of CPU cores).
>>>
>>> The Scala API is being rewritten for our next release, so if you
>>> really want to check out Scala examples I could point you to my
>>> personal branch on github where development of the new Scala API is
>>> taking place.
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
>>> <wir12kqe@studserv.uni-leipzig.de> wrote:
>>>>
>>>> Hello,
>>>> I'm a bit confused about the performance of Flink.
>>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5
>>>> gb
>>>> reserved for OS). using flink-0.6 in standalone-cluster mode.
>>>> i played a little bit with the config-settings but without much impact
>>>> on
>>>> execution time.
>>>> flink-conf.yaml:
>>>> jobmanager.rpc.port: 6123
>>>> jobmanager.heap.mb: 1024
>>>> taskmanager.heap.mb: 14336
>>>> taskmanager.memory.size: -1
>>>> taskmanager.numberOfTaskSlots: 4
>>>> parallelization.degree.default: 16
>>>> taskmanager.network.numberOfBuffers: 4096
>>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
>>>>
>>>> I tried two applications: wordcount and k-Means scala example code
>>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
>>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
>>>
>>> input
>>>>
>>>> it needs 33minutes with 2.2gb nearly 90 minutes!
>>>>
>>>> the monitoring tool ganglia says, that cpu has low cpu utilization and a
>>>
>>> lot
>>>>
>>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
>>>> Is this a ordinary dimension of execution time in spark? or are
>>>> optimizations in my config necessary? or maybe a bottleneck in the
>>>
>>> cluster?
>>>>
>>>> i hope somebody could help me :)
>>>> greets Norman
>
>

Mime
View raw message