Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 295E3119A4 for ; Mon, 8 Sep 2014 14:14:05 +0000 (UTC) Received: (qmail 68556 invoked by uid 500); 8 Sep 2014 14:14:05 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 68498 invoked by uid 500); 8 Sep 2014 14:14:05 -0000 Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list dev@flink.incubator.apache.org Received: (qmail 68483 invoked by uid 99); 8 Sep 2014 14:14:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2014 14:14:05 +0000 X-ASF-Spam-Status: No, hits=-1999.5 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 08 Sep 2014 14:13:42 +0000 Received: (qmail 68360 invoked by uid 99); 8 Sep 2014 14:13:40 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2014 14:13:40 +0000 Received: from localhost (HELO mail-qa0-f48.google.com) (127.0.0.1) (smtp-auth username rmetzger, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2014 14:13:38 +0000 Received: by mail-qa0-f48.google.com with SMTP id m5so13833178qaj.7 for ; Mon, 08 Sep 2014 07:13:37 -0700 (PDT) X-Received: by 10.140.85.135 with SMTP id n7mr39630176qgd.22.1410185617797; Mon, 08 Sep 2014 07:13:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.93.21 with HTTP; Mon, 8 Sep 2014 07:13:17 -0700 (PDT) In-Reply-To: References: <540DA5B8.7070201@studserv.uni-leipzig.de> From: Robert Metzger Date: Mon, 8 Sep 2014 16:13:17 +0200 Message-ID: Subject: Re: flink performance To: "dev@flink.incubator.apache.org" Content-Type: multipart/alternative; boundary=001a11c1271ed0ff0105028e6cb3 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1271ed0ff0105028e6cb3 Content-Type: text/plain; charset=UTF-8 There is probably a little typo in Aljoscha's answer. The taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine) The parallelization.degree.default is correct. On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek wrote: > Hi Norman, > I saw you were running our Scala Examples. Unfortunately those do not > run as well as our Java examples right now. The Scala API was a bit of > a prototype that has some issues with efficiency. For now, you could > maybe try running our Java examples. > > For your cluster, good configuration values would be numberOfTaskSlots > = 4 (number of CPU cores) and parallelization.degree.default = 32 > (number of nodes X number of CPU cores). > > The Scala API is being rewritten for our next release, so if you > really want to check out Scala examples I could point you to my > personal branch on github where development of the new Scala API is > taking place. > > Cheers, > Aljoscha > > On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg > wrote: > > Hello, > > I'm a bit confused about the performance of Flink. > > My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb > > reserved for OS). using flink-0.6 in standalone-cluster mode. > > i played a little bit with the config-settings but without much impact on > > execution time. > > flink-conf.yaml: > > jobmanager.rpc.port: 6123 > > jobmanager.heap.mb: 1024 > > taskmanager.heap.mb: 14336 > > taskmanager.memory.size: -1 > > taskmanager.numberOfTaskSlots: 4 > > parallelization.degree.default: 16 > > taskmanager.network.numberOfBuffers: 4096 > > fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ > > > > I tried two applications: wordcount and k-Means scala example code > > wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. > > kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb > input > > it needs 33minutes with 2.2gb nearly 90 minutes! > > > > the monitoring tool ganglia says, that cpu has low cpu utilization and a > lot > > of waiting time. in wordcount cpu utilizes with nearly 100 percent. > > Is this a ordinary dimension of execution time in spark? or are > > optimizations in my config necessary? or maybe a bottleneck in the > cluster? > > > > i hope somebody could help me :) > > greets Norman > --001a11c1271ed0ff0105028e6cb3--