Return-Path: X-Original-To: apmail-allura-dev-archive@www.apache.org Delivered-To: apmail-allura-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0CDC711811 for ; Thu, 25 Sep 2014 04:41:40 +0000 (UTC) Received: (qmail 61534 invoked by uid 500); 25 Sep 2014 04:41:40 -0000 Delivered-To: apmail-allura-dev-archive@allura.apache.org Received: (qmail 61509 invoked by uid 500); 25 Sep 2014 04:41:40 -0000 Mailing-List: contact dev-help@allura.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@allura.apache.org Delivered-To: mailing list dev@allura.apache.org Received: (qmail 61496 invoked by uid 99); 25 Sep 2014 04:41:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Sep 2014 04:41:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aluberg@slashdotmedia.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Sep 2014 04:41:35 +0000 Received: by mail-we0-f176.google.com with SMTP id w61so6372859wes.7 for ; Wed, 24 Sep 2014 21:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=slashdotmedia.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TVMp38UG8MtvU1JqeOeELNidd7W4gXB1D7Vn6LNSoW4=; b=KQGWCsNpgp90yFYJGgdfB5fLIOWYRXOhBjQzSqHjbAJgr4e+tMgQgFt9Dmo4oCn5V6 yYD7e8JHAwK5MEQDqkDWcWXnbhP7YSPjAaOCGzsT4wX/+/FZrJypJbsb7LAo04VvWXtM xSUZ1sv30khHN/w52Wvm/VGQ1uNbeETUDKazQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=TVMp38UG8MtvU1JqeOeELNidd7W4gXB1D7Vn6LNSoW4=; b=WW62aV2YMpMFegZcIH/K2B6hgz6Z9yt1n6y1FpvbVzaBWcVRbKTlxgFXpZnRmkgR/m XcFy/vIEoPzA7Si/knn8T1LKV7KdjsEkjL41chbnyjzxd/BPEd2GF5FNmVd2QBcecWH4 OC3LivxFZ64rOho2lQTAFX+jxJ8f+zacu1aiEIgv0iyY5vHCn10ULSOHxzNM340thDab hQ8xTBZhM91JjI1Y5gdywMnl8bRRWEbsyB30QGvM5FeIjZb7b41QSbCg/WHHmhjTrTW+ 30QWQ0j1gjAZMHMknvA/iYJB9jR0vl3bmkoXktv5omAOp+aZEvIYekQfOmcdKlQ6hVjf R1wA== X-Gm-Message-State: ALoCoQnS8nnmEQU9rU6pC2rfJRlc+FpNDM3Zz8hJw1RTnnGIU5/AyjsEyWTPkjW/bvQOxvo47wV4 MIME-Version: 1.0 X-Received: by 10.180.83.39 with SMTP id n7mr1868708wiy.0.1411620073944; Wed, 24 Sep 2014 21:41:13 -0700 (PDT) Received: by 10.216.18.136 with HTTP; Wed, 24 Sep 2014 21:41:13 -0700 (PDT) In-Reply-To: References: <541C5756.1020604@brondsema.net> <541C9F36.2050005@brondsema.net> Date: Wed, 24 Sep 2014 21:41:13 -0700 Message-ID: Subject: Re: Tests locking up with 100% CPU usage From: Alex Luberg To: dev@allura.apache.org Content-Type: multipart/alternative; boundary=f46d0443046610d07d0503dc693c X-Virus-Checked: Checked by ClamAV on apache.org --f46d0443046610d07d0503dc693c Content-Type: text/plain; charset=UTF-8 I have discovered that the suite passed with 756 tests, and if I added another test(just copied some existing one with a different name) it locked up at some test(which was not the one i've copied). I suspect that it is not related to the actual test code, but something with nose/python/sandbox. On Mon, Sep 22, 2014 at 3:40 AM, Igor Bondarenko wrote: > On Sat, Sep 20, 2014 at 12:25 AM, Dave Brondsema > wrote: > > > On 9/19/14 12:18 PM, Dave Brondsema wrote: > > > Starting with Igor's comments on > > https://sourceforge.net/p/allura/tickets/7657/#c7d9 > > > > > >> There's a couple of new tests commented out in a last commit. I can't > > figure out why, but they cause allura/tests/test_dispatch.py to hang when > > run together with other tests. Also I have added and then removed tests > for > > enable/disable user for the same reason. > > >> > > >> I think it needs another pair of eyes on it, since I've already spent > > too much time dealing with this tests and have no idea what's > happening... > > Maybe I'm missing something obvious. > > > > > > Alex and I have seen this recently too, and its hard to figure out what > > exactly > > > is the problem. I first noticed it when running `./run_tests > > --with-coverage` > > > which would run nosetests in the Allura dir and would not use > > --processes=N > > > because of the with-coverage param. So basically just a regular run of > > the > > > tests in the Allura dir would cause the CPU to go into 100% usage and > > the tests > > > wouldn't finish. Couldn't ctrl-C or profile them, had to kill -9 it. > > > > > > That was on Centos 5.10 and a workaround was to run with --processes=N > > and then > > > the tests would finish fine. On the Ubuntu vagrant image, I didn't > > encounter > > > any problem in the first place. So perhaps related to the environment. > > > > > > I tried to narrow down to a specific test that might be the culprit. I > > found > > > tests consistently got up to TestSecurity.test_auth (which is a bit > > weird and > > > old test anyway). And also that commenting out that test let them all > > pass. > > > > > > But I'm pretty sure Alex said he dug into this as well and found > > variation in > > > what tests could cause the problem. I think he told me that going back > > in > > > git-history before the problem, and then adding a single test (a copy > of > > an > > > existing one) caused the problem. So perhaps some limit, or resource > > tipping > > > point is hit. > > > > > > Alex or Igor, any more data points you know from what you've seen? > > > > > > Anyone else seen anything like this? Or have ideas for how to approach > > nailing > > > it down better? > > > > > > > > > > I tried checking out branch je/42cc_7657 and going back to commit > > 4cc63586e5728d7d0c5c2f09150eb07eb7e4edc1 (before tests were commented > out) > > to > > see what happened for me: > > > > On vagrant / ubuntu, it froze at test_dispatch.py same as you. So some > > consistency there. Tests passed when I ran `nosetests > > --process-timeout=180 > > --processes=4 -v` in the Allura dir. Seemed slow at the end though, > almost > > thought it froze. > > > > On centos, it froze at a different spot with a regular nosetests run. It > > passed > > with `nosetests allura/tests/ --processes=4 --process-timeout=180 -v`. > > For some > > reason (hopefully unrelated), I needed to specify path "allura/tests/" to > > avoid > > an IOError from multiprocessing. > > > > So at least multiprocess tests still seems like a workaround for me. > Note: > > ./run_tests picks a --processes=N value dynamically based on the > machine's > > CPU > > cores, so with a single core you don't get multiple processes that way. > > Also > > note: if you have nose-progressive installed and active, that is > > incompatible > > with multiple processes. > > > > > It works exactly as you described for me too. > > I've reverted some commits with those tests, since problem not with them > and they are useful https://sourceforge.net/p/allura/tickets/7657/#8c06 > and > also made a fix in 42cc's Makefile (commited directly in master), so that > it would always run tests in parallel (turns out here at 42cc we have > single core CPUs on boxes that run tests, that's why I had locks on our CI > also :( ) > --f46d0443046610d07d0503dc693c--