Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1233B177BD for ; Fri, 31 Oct 2014 14:58:10 +0000 (UTC) Received: (qmail 82161 invoked by uid 500); 31 Oct 2014 14:58:09 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 82095 invoked by uid 500); 31 Oct 2014 14:58:09 -0000 Mailing-List: contact issues-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list issues@flink.incubator.apache.org Received: (qmail 82064 invoked by uid 99); 31 Oct 2014 14:58:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2014 14:58:09 +0000 X-ASF-Spam-Status: No, hits=-2000.6 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 31 Oct 2014 14:58:08 +0000 Received: (qmail 68295 invoked by uid 99); 31 Oct 2014 14:56:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2014 14:56:33 +0000 Date: Fri, 31 Oct 2014 14:56:33 +0000 (UTC) From: "Alexander Alexandrov (JIRA)" To: issues@flink.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-1195) Improvement of benchmarking infrastructure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FLINK-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191888#comment-14191888 ] Alexander Alexandrov commented on FLINK-1195: --------------------------------------------- Indeed from what you have described Peel seems to be a good fit and is also going to be further developed in 2015. Can you point me to the jobs that are running as part of the current benchmark? > Improvement of benchmarking infrastructure > ------------------------------------------ > > Key: FLINK-1195 > URL: https://issues.apache.org/jira/browse/FLINK-1195 > Project: Flink > Issue Type: Wish > Reporter: Till Rohrmann > > I noticed while running my ALS benchmarks that we still have some potential to improve our benchmarking infrastructure. The current state is that we execute the benchmark jobs by writing a script with a single set of parameters. The runtime is then manually retrieved from the web interface of Flink and Spark, respectively. > I think we need the following extensions: > * Automatic runtime retrieval and storage in a file > * Repeated execution of jobs to gather some "advanced" statistics such as mean and standard deviation of the runtimes > * Support for value sets for the individual parameters > The automatic runtime retrieval would allow us to execute several benchmarks consecutively without having to lookup the runtimes in the logs or in the web interface, which btw only stores the runtimes of the last 5 jobs. > What I mean with value sets is that would be nice to specify a set of parameter values for which the benchmark is run without having to write for every single parameter combination a benchmark script. I believe that this feature would become very handy when we want to look at the runtime behaviour of Flink for different input sizes or degrees of parallelism, for example. To illustrate what I mean: > {code} > INPUTSIZE = 1000, 2000, 4000, 8000 > DOP = 1, 2, 4, 8 > OUTPUT=benchmarkResults > repetitions=10 > command=benchmark.jar -p $DOP $INPUTSIZE > {code} > Something like that would execute the benchmark job with (DOP=1, INPUTSIZE=1000), (DOP=2, INPUTSIZE=2000),.... 10 times each, calculate for each parameter combination runtime statistics and store the results in the file benchmarkResults. > I believe that spending some effort now will pay off in the long run because we will benchmark Flink continuously. What do you guys think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)