Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CANMXwW1MVPBPF-7BGJ8RoqeVRP-sa=yC=hxo031xtULAcW=p2w@mail.gmail.com>
References: <540DA5B8.7070201@studserv.uni-leipzig.de>
 <CANMXwW1MVPBPF-7BGJ8RoqeVRP-sa=yC=hxo031xtULAcW=p2w@mail.gmail.com>
From: Robert Metzger <rmetzger@apache.org>
Date: Mon, 8 Sep 2014 16:13:17 +0200
Message-ID: 
 <CAGr9p8C+1Pi1rYaGb8rSB_EBNsB=2qwFx59BSTm_ch0vR_3pFg@mail.gmail.com>
Subject: Re: flink performance
To: "dev@flink.incubator.apache.org" <dev@flink.incubator.apache.org>
Content-Type: multipart/alternative; boundary=001a11c1271ed0ff0105028e6cb3

--001a11c1271ed0ff0105028e6cb3
Content-Type: text/plain; charset=UTF-8

There is probably a little typo in Aljoscha's answer. The
taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine)
The parallelization.degree.default is correct.

On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi Norman,
> I saw you were running our Scala Examples. Unfortunately those do not
> run as well as our Java examples right now. The Scala API was a bit of
> a prototype that has some issues with efficiency. For now, you could
> maybe try running our Java examples.
>
> For your cluster, good configuration values would be numberOfTaskSlots
> = 4 (number of CPU cores) and parallelization.degree.default = 32
> (number of nodes X number of CPU cores).
>
> The Scala API is being rewritten for our next release, so if you
> really want to check out Scala examples I could point you to my
> personal branch on github where development of the new Scala API is
> taking place.
>
> Cheers,
> Aljoscha
>
> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
> <wir12kqe@studserv.uni-leipzig.de> wrote:
> > Hello,
> > I'm a bit confused about the performance of Flink.
> > My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb
> > reserved for OS). using flink-0.6 in standalone-cluster mode.
> > i played a little bit with the config-settings but without much impact on
> > execution time.
> > flink-conf.yaml:
> > jobmanager.rpc.port: 6123
> > jobmanager.heap.mb: 1024
> > taskmanager.heap.mb: 14336
> > taskmanager.memory.size: -1
> > taskmanager.numberOfTaskSlots: 4
> > parallelization.degree.default: 16
> > taskmanager.network.numberOfBuffers: 4096
> > fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
> >
> > I tried two applications: wordcount and k-Means scala example code
> > wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
> > kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
> input
> > it needs 33minutes with 2.2gb nearly 90 minutes!
> >
> > the monitoring tool ganglia says, that cpu has low cpu utilization and a
> lot
> > of waiting time. in wordcount cpu utilizes with nearly 100 percent.
> > Is this a ordinary dimension of execution time in spark? or are
> > optimizations in my config necessary? or maybe a bottleneck in the
> cluster?
> >
> > i hope somebody could help me :)
> > greets Norman
>

--001a11c1271ed0ff0105028e6cb3--