Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 839ED7FF0 for ; Sun, 11 Dec 2011 13:08:42 +0000 (UTC) Received: (qmail 22776 invoked by uid 500); 11 Dec 2011 13:08:42 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 22733 invoked by uid 500); 11 Dec 2011 13:08:41 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 22725 invoked by uid 99); 11 Dec 2011 13:08:41 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Dec 2011 13:08:41 +0000 Received: from localhost (HELO [208.47.131.156]) (127.0.0.1) (smtp-auth username gsingers, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Dec 2011 13:08:40 +0000 From: Grant Ingersoll Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: multipart/alternative; boundary="Apple-Mail=_32062AAF-48A0-43B6-AF9A-0272A110036A" Subject: Re: Tests running time Date: Sun, 11 Dec 2011 06:08:29 -0700 In-Reply-To: <3265C04D-58DE-4CA3-AC27-A4216B7CF05F@apache.org> To: dev@mahout.apache.org References:

<789FF4C7-0D4E-4A6E-AF48-B82F72E2AF4C@apache.org> <2C29F7AF-3681-40A6-AA25-6C29826943B9@apache.org> <83C6046C-B10B-47AD-BE88-25801B1E1B94@apache.org> <3265C04D-58DE-4CA3-AC27-A4216B7CF05F@apache.org> Message-Id: <115AE885-1232-45D7-948F-04C96005110D@apache.org> X-Mailer: Apple Mail (2.1251.1) --Apple-Mail=_32062AAF-48A0-43B6-AF9A-0272A110036A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii In working through what I _think_ will be the primary viable way to make = this stuff faster (parallel execution, fork once) it appears to me that = the primary concurrency issue is due to how we initialize the test seed = and the fact that we loop over all RandomWrapper objects and reset them. = So, it's likely the case that in mid stream of some of the tests, the = RNG is getting reset by other calls to the static useTestSeed() method. =20= Of course, there might be other concurrency issues beyond that, but this = seems like the most likely one to start. Thus, the question is how to = fix it. The obvious one, I suppose, is to not use statics for this = stuff. Another is, to perhaps, use a system property = (-DuseTestSeed=3Dtrue and/or -DuseSeed=3D, the latter being useful = for debugging other things) that is set upon invocation in the test = plugin, but has the downside that it would also need to be set when = running from an IDE. And, to Sean's point below, it seems that we may have some test = dependencies on the specific set of random numbers and the outcomes they = produce. Thoughts? Other ideas? On Dec 8, 2011, at 1:05 PM, Grant Ingersoll wrote: > Progress! I had configured the surefire plugin in the wrong place >=20 >=20 > On Dec 8, 2011, at 2:55 PM, Sean Owen wrote: >=20 >> This could well be it. While every Random everywhere gets initialized = to a >> known initial state, at the start of every @Test method, you could = get >> different sequences if other tests are in progress in parallel in the = same >> JVM. >>=20 >> Ideally tests aren't that sensitive to the sequence of random numbers = -- if >> that's the case. And here it may well be the case. >>=20 >> Can this be set to fork a JVM per test class? that would probably = work. >>=20 >> On Thu, Dec 8, 2011 at 7:43 PM, Grant Ingersoll = wrote: >>=20 >>>=20 >>> On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote: >>>=20 >>>>=20 >>>> On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote: >>>>=20 >>>>> If I add parallel, fork always to the main surefire config, I get >>> failures all over the place for things like: >>>>> Failed tests: >>> = testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSol= ver): >>> Error: {0.06146049974880152 too high! (for eigen 3) >>>>> consistency(org.apache.mahout.math.jet.random.NormalTest): >>> offset=3D0.000 scale=3D1.000 Z =3D 8.2 >>>>> consistency(org.apache.mahout.math.jet.random.ExponentialTest): >>> offset=3D0.000 scale=3D100.000 Z =3D 8.7 >>>>>=20 >>>>=20 >>>> Check that, it seems each run can produce different failures, which >>> leads me to believe we have some shared values in our tests >>>=20 >>> Random.getRandom() the culprit, perhaps? >>>=20 >>>>=20 >>>>=20 >>>>> All of these pass individually and when not in parallel for me. >>>>>=20 >>>>> Here's my config: >>>>> >>>>> org.apache.maven.plugins >>>>> maven-surefire-plugin >>>>> 2.11 >>>>> >>>>> classes >>>>> always >>>>> true >>>>> >>>>> >>>>>=20 >>>>> Anyone else seeing that? >>>>>=20 >>>>>=20 >>>>> On Dec 8, 2011, at 1:53 PM, Dmitriy Lyubimov wrote: >>>>>=20 >>>>>> SSVD actually runs a rather small test but it is a MR job in = local >>>>>> mode, there's nothing to cut down there in terms of size (not = much >>>>>> anyway). It's just what it takes to initialize and run all jobs = (and >>>>>> since it is local, it is also single threaded, so it actually = runs V >>>>>> and U jobs sequentially instead of parallel so it's even longer >>>>>> because of that (4 jobs stringed all in all). >>>>>>=20 >>>>>> But i will take a look, although even if i reduce solution size, = it >>>>>> will still likely not reduce running time by more than 20%. >>>>>>=20 >>>>>> On Thu, Dec 8, 2011 at 5:42 AM, David Murgatroyd = >>> wrote: >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> On Dec 8, 2011, at 8:36 AM, Grant Ingersoll = >>> wrote: >>>>>>>=20 >>>>>>>> MAHOUT-916 and 917 are attempts to address the running time of = our >>> tests. As Sean rightfully pointed out, there are probably = opportunities to >>> simply cut down the sizes of some of these tests w/o effecting there >>> correctness. To that end, if people can take a look at: >>>>>>>> = https://builds.apache.org/job/Mahout-Quality/1237/testReport/junit/ >>>>>>>>=20 >>>>>>>> You can get a sense as to which tests are taking a long time. = The >>> main culprits are: >>>>>>>> 1. Vectorizer >>>>>>>> 2. SSVD >>>>>>>> 3. K-Means >>>>>>>> 4. taste.hadoop.item >>>>>>>> 5. taste.hadoop.als >>>>>>>> 6. PFPGrowth >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> -Grant >>>>>>>>=20 >>>>>>>> -------------------------------------------- >>>>>>>> Grant Ingersoll >>>>>>>> http://www.lucidimagination.com >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>=20 >>>>> -------------------------------------------- >>>>> Grant Ingersoll >>>>> http://www.lucidimagination.com >>>>>=20 >>>>>=20 >>>>>=20 >>>>=20 >>>> -------------------------------------------- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com >>>>=20 >>>>=20 >>>>=20 >>>=20 >>> -------------------------------------------- >>> Grant Ingersoll >>> http://www.lucidimagination.com >>>=20 >>>=20 >>>=20 >>>=20 >=20 > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com >=20 >=20 >=20 -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com --Apple-Mail=_32062AAF-48A0-43B6-AF9A-0272A110036A--