Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B6B9A200BE8 for ; Fri, 9 Dec 2016 07:05:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B5331160B27; Fri, 9 Dec 2016 06:05:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D4AD3160B1F for ; Fri, 9 Dec 2016 07:05:04 +0100 (CET) Received: (qmail 99990 invoked by uid 500); 9 Dec 2016 06:05:03 -0000 Mailing-List: contact dev-help@systemml.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.incubator.apache.org Delivered-To: mailing list dev@systemml.incubator.apache.org Received: (qmail 99974 invoked by uid 99); 9 Dec 2016 06:05:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Dec 2016 06:05:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D5E9C189AE6 for ; Fri, 9 Dec 2016 06:05:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.801 X-Spam-Level: X-Spam-Status: No, score=-1.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JL2Tn5pGCApz for ; Fri, 9 Dec 2016 06:04:59 +0000 (UTC) Received: from nm41-vm9.bullet.mail.ne1.yahoo.com (nm41-vm9.bullet.mail.ne1.yahoo.com [98.138.120.213]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7F1035F405 for ; Fri, 9 Dec 2016 06:04:58 +0000 (UTC) Received: from [127.0.0.1] by nm41.bullet.mail.ne1.yahoo.com with NNFMP; 09 Dec 2016 06:04:50 -0000 Received: from [98.138.100.118] by nm41.bullet.mail.ne1.yahoo.com with NNFMP; 09 Dec 2016 06:02:09 -0000 Received: from [98.138.87.4] by tm109.bullet.mail.ne1.yahoo.com with NNFMP; 09 Dec 2016 05:59:16 -0000 Received: from [127.0.0.1] by omp1004.mail.ne1.yahoo.com with NNFMP; 09 Dec 2016 05:59:16 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 444750.19553.bm@omp1004.mail.ne1.yahoo.com X-YMail-OSG: .F8p6jwVM1mCE6rLjicHs60CxKabNUsNEZB.WDRVMR.k90nhZ9LdIqDfPkxUlUU 4ZFndDXcAGcGdKIAvlwJZR8_sbFP03sQd0vOwep4YTT4Xc_osxZ5Xe7A_BQNzQDOpJlTHlwnfU3F 5gd6mMrq1z0488hZTB08oleKTdOJgsubiyOIosQiigxYoR9mU3B34QcN_pKKWw0.DlqMlRnvmKmD 1eAroXX37GApsmvxbBkcb8psqY1zIDXHwwGpd8SjxS97HOFXoQeX5yszYnL7NC3T093VSXgV629w PY7u_F24IHSPTz15qadF4qsWQJzRv.pZEKmyhJyWepMpkw2iAokQBYO5GzuUCL3eLHC6qcGBkWwE uA1zJQoaqqCYSYPw9R1V5x0QmJq1dcw2kRm9KL0gRuoM7kc3Efrx.8PX595L5Vr3Jrw.i6zMUtxK FW1hx3dwcFDefwSY9J3IFbmz.WG0XJhCnwHxVvo8xMvLUauZ7NLDJG85E5SqCQg8dMAEYhbggREj pOTpSalPrpVmvi6rG1ijGYPkH Received: from jws200019.mail.ne1.yahoo.com by sendmailws124.mail.ne1.yahoo.com; Fri, 09 Dec 2016 05:59:16 +0000; 1481263156.036 Date: Fri, 9 Dec 2016 05:59:15 +0000 (UTC) From: Acs S Reply-To: Acs S To: "dev@systemml.incubator.apache.org" Message-ID: <1941027687.885859.1481263155793@mail.yahoo.com> In-Reply-To: References: Subject: Re: test suite running slowly after disable cache/sparse commit? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_885858_1665968334.1481263155790" archived-at: Fri, 09 Dec 2016 06:05:05 -0000 ------=_Part_885858_1665968334.1481263155790 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 On adding Jenkins Build machines on PR builds. Couple of times I hit waiting PR builds due to queue. If that is not common= , we can wait. -Arvind From: Deron Eriksson To: dev@systemml.incubator.apache.org=20 Sent: Friday, December 9, 2016 7:34 AM Subject: Re: test suite running slowly after disable cache/sparse commit? =20 Hi Fred, The last two daily tests ran around ~2:56 hr, so if this number is stable, it seems that the new tests potentially add about half an hour to the test suite time. I would like if we could decrease the test suite time rather than add significantly to it. In fact, personally I'd prefer if we could do something like move the time-consuming algorithm-type tests out of the main test suite and just run the algorithm tests daily (if this is technically possible). That way, we could get the main test suite time to be sped up significantly but still benefit from daily test coverage provided by the algorithm tests. I like the idea of a short test suite time since that makes it easier to get feedback and continue working on an issue that day. If the tests take too long to run, it means that issues that could potentially be solved in one day will get pushed out to another day. Increasing the number of simultaneous Jenkins jobs allowed could help with queued-up builds, which would be nice. Currently Jenkins runs a max of two simultaneous jobs. Jenkins currently handles: 1) two daily builds (at noon and at midnight) 2) on-demand builds (so a developer can commit some code on a branch and then have jenkins build/test so that a developer's machine isn't tied up) 3) pull request builds (the initial push with a PR will trigger this along with any subsequent pushes to the branch referenced by the PR). Today there is not a queue, but I'm the only person to trigger a PR build today. If more than two developers are submitting PRs that day, there will be a queue. This queue has been manageable, but if the increase in test suite time is a permanent thing, I'd recommend bumping the simultaneous Jenkins jobs from two to four. Deron On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss wrote: > +dev list > > I personally don't mind letting the regression suite run overnight. The > important thing is that we do not push changes that have not passed the > full automated test suite. In the interest of efficiency, we shouldn't ev= en > be reviewing most PRs until after they have passed the automated tests. > > Deron, are you seeing a backlog of not-yet-started builds queueing up on > the PR build server? If the queue is getting long, we can add additional > machines to the Jenkins cluster. > > Fred > > [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52 > AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan, > > From: Deron Eriksson/San Francisco/IBM > To: Niketan Pansare/Almaden/IBM@IBMUS > Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R > Reiss/Almaden/IBM@IBMUS > Date: 12/08/2016 11:06 AM > Subject: Re: test suite running slowly after disable cache/sparse commit? > ------------------------------ > > > > Hi Niketan, > > Perhaps Berthold or Fred could add a little guidance here in terms of wha= t > is acceptable? Having the test suite go from 2:21 to 3:41 (one pull reque= st > yesterday took 4:11 to complete - > *https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909= /* > ) > is very serious to me. Even if the test suite runs at 3:00, this is a > serious slowdown. It slows down our ability to validate pull requests and > other code on jenkins. > > Deron > > > ----- Original message ----- > From: Niketan Pansare/Almaden/IBM > To: Deron Eriksson/San Francisco/IBM@ibmus > Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R > Reiss/Almaden/IBM@ibmus > Subject: Re: test suite running slowly after disable cache/sparse commit? > Date: Thu, Dec 8, 2016 8:55 AM > > Hi Deron, > > The commit replicated application tests for disable sparse and disable > caching. So, the test time should increase. We should increase the durati= on > or reduce the number of application tests we want to test with caching an= d > sparse disabled. > > Thanks > > Niketan > > On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*deron@us.ibm.com* > > wrote: > >=C2=A0 =C2=A0 Hi Niketan, > >=C2=A0 =C2=A0 =C2=A0 I noticed the daily test yesterday timed out, probabl= y because of a >=C2=A0 =C2=A0 =C2=A0 long-running test. > >=C2=A0 =C2=A0 =C2=A0 Looking at the commits from the day before ( >=C2=A0 =C2=A0 =C2=A0 *https://github.com/apache/incubator-systemml/commits= /master* >=C2=A0 =C2=A0 =C2=A0 ), I >=C2=A0 =C2=A0 =C2=A0 noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed -= disable-caching and >=C2=A0 =C2=A0 =C2=A0 -disable-=E2=80=A6 ( >=C2=A0 =C2=A0 =C2=A0 *https://github.com/apache/incubator-systemml/commit/= caaaec90b61e529e50021d89f9f108230fa307a8* >=C2=A0 =C2=A0 =C2=A0 ) >=C2=A0 =C2=A0 =C2=A0 updated some of the tests. > >=C2=A0 =C2=A0 =C2=A0 So I ran the tests on the previous commit ( >=C2=A0 =C2=A0 =C2=A0 *https://sparktc.ibmcloud.com/jenkins/job/SystemML-On= Demand/227/* >=C2=A0 =C2=A0 =C2=A0 ) >=C2=A0 =C2=A0 =C2=A0 and the tests ran in 2hr 21min. > >=C2=A0 =C2=A0 =C2=A0 I ran the tests on the 'disable caching...' commit ( >=C2=A0 =C2=A0 =C2=A0 *https://sparktc.ibmcloud.com/jenkins/job/SystemML-On= Demand/228/* >=C2=A0 =C2=A0 =C2=A0 ) >=C2=A0 =C2=A0 =C2=A0 and the tests ran in 3hr 41min. > >=C2=A0 =C2=A0 =C2=A0 One thing that is confusing to me is that the nightly= test just >=C2=A0 =C2=A0 =C2=A0 completed successfully ( >=C2=A0 =C2=A0 =C2=A0 *https://sparktc.ibmcloud.com/jenkins/job/SystemML-Da= ilyTest/674/* >=C2=A0 =C2=A0 =C2=A0 ) >=C2=A0 =C2=A0 =C2=A0 in 2hr 57min and did not time out like yesterday afte= rnoon. So it is always >=C2=A0 =C2=A0 =C2=A0 possible it could be a server issue. > >=C2=A0 =C2=A0 =C2=A0 Could you look into this and see if that commit intro= duced an issue >=C2=A0 =C2=A0 =C2=A0 with the tests? > >=C2=A0 =C2=A0 =C2=A0 Thanks! >=C2=A0 =C2=A0 =C2=A0 Deron > > > > > > --=20 Deron Eriksson Spark Technology Center http://www.spark.tc/ =20 ------=_Part_885858_1665968334.1481263155790--