Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C5E3417BF2 for ; Wed, 25 Mar 2015 03:45:15 +0000 (UTC) Received: (qmail 73966 invoked by uid 500); 25 Mar 2015 03:44:53 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 73752 invoked by uid 500); 25 Mar 2015 03:44:53 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 73736 invoked by uid 99); 25 Mar 2015 03:44:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 03:44:53 +0000 Date: Wed, 25 Mar 2015 03:44:53 +0000 (UTC) From: "Jayush Luniya (JIRA)" To: dev@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AMBARI-10197) Apache builds for trunk are getting aborted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMBARI-10197?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayush Luniya updated AMBARI-10197: ----------------------------------- Assignee: Jonathan Hurley > Apache builds for trunk are getting aborted > ------------------------------------------- > > Key: AMBARI-10197 > URL: https://issues.apache.org/jira/browse/AMBARI-10197 > Project: Ambari > Issue Type: Bug > Components: ambari-agent > Affects Versions: 2.1.0 > Reporter: Jayush Luniya > Assignee: Jonathan Hurley > Fix For: 2.1.0 > > > On 3/24/15, 7:50 PM, "Jonathan Hurley" wrote: > Ah, I see that. Looks like TestController.TestController is a common them= e here then. I tried running the tests on CentOS 6 instead of OSX and it lo= oks like mine hung on test_certSigningFailed the first time and test_heartb= eat_no_host_check_cmd_in_queue the second time. > Let=E2=80=99s open up a Jira for this so it can be tracked and resolved. > On Mar 24, 2015, at 7:20 PM, Jayush Luniya wrot= e: > Hi Jonathan, > Yes, as I mentioned the UT tests hang which is not 100% repro. The BOA is > aborted after 2 hours. > However the builds always hang during Ambari Agent Test. If you see the > logs further up, you will see that the actual abort happened during the > TestController UTs (I.e. Python was terminated), but the build was not ye= t > entirely terminated and hence we continue building the ambari client, > python client until it was completely aborted. > test_addToStatusQueue (TestController.TestController) ... ok > test_certSigningFailed (TestController.TestController) ... ok > test_heartbeatWithServer (TestController.TestController) ... ok > test_registerAndHeartbeat (TestController.TestController) ... ok > test_registerAndHeartbeatWithException (TestController.TestController) ..= . > ok > test_registerAndHeartbeat_check_registration_listener > (TestController.TestController) ... Build timed out (after 120 minutes). > Marking the build as aborted. > Build was aborted > /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/..= /a > mbari-common/src/main/unix/ambari-python-wrap: line 40: 31955 Terminated > $PYTHON "$@" > [INFO] =20 > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Ambari Client 2.0.0-SNAPSHOT > [INFO] > ------------------------------------------------------------------------ > [INFO] > [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ambari-client -= -- > [INFO] Deleting > /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client > (includes =3D [**/*.pyc], excludes =3D []) > [INFO] > [INFO] --- build-helper-maven-plugin:1.8:regex-property > (parse-package-version) @ ambari-client --- > [INFO] > [INFO] --- build-helper-maven-plugin:1.8:regex-property > (parse-package-release) @ ambari-client --- > [INFO] > [INFO] --- apache-rat-plugin:0.11:check (default) @ ambari-client --- > [INFO] 53 implicit excludes (use -debug for more details). > [INFO] No excludes explicitly specified. > [INFO] 2 resources included (use -debug for more details) > [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 > approved: 2 licence. > [INFO] > [INFO] --- maven-assembly-plugin:2.2-beta-5:single (build-tarball) @ > ambari-client --- > [INFO] Reading assembly descriptor: assemblies/client.xml > [INFO] > [INFO] --- maven-assembly-plugin:2.2-beta-5:single (make-assembly) @ > ambari-client --- > [INFO] Reading assembly descriptor: assemblies/client.xml > [INFO] > [INFO] --- maven-install-plugin:2.4:install (default-install) @ > ambari-client --- > [INFO] Installing > /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/p= om > .xml to > /home/jenkins/.m2/repository/org/apache/ambari/ambari-client/2.0.0-SNAPSH= OT > /ambari-client-2.0.0-SNAPSHOT.pom > [INFO] =20 > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Ambari Python Client 2.0.0-SNAPSHOT > [INFO] > ------------------------------------------------------------------------ > [INFO] > [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ python-client -= -- > [INFO] Deleting > /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/p= yt > hon-client (includes =3D [**/*.pyc], excludes =3D []) > [INFO] > [INFO] --- build-helper-maven-plugin:1.8:regex-property > (parse-package-version) @ python-client --- > [INFO] > [INFO] --- build-helper-maven-plugin:1.8:regex-property > (parse-package-release) @ python-client --- > [INFO] > [INFO] --- exec-maven-plugin:1.2:exec (python-test) @ python-client --- > Updating AMBARI-10163 > Recording test results > Warning: you have no plugins providing access control for builds, so > falling back to legacy behavior of permitting any downstream builds to be > triggered > Finished: ABORTED > Thanks > Jayush > On 3/24/15, 1:25 PM, "Jonathan Hurley" wrote: > I think that we=C2=B9re looking in the wrong places. Consider: > https://builds.apache.org/job/Ambari-trunk-Commit/2101 > and > https://builds.apache.org/job/Ambari-trunk-Commit/2100 > 2101 successfully built in about an hour. 2100 did not; it aborted after > 2 hours. It aborted during the Groovy unit tests. Ambari unit test time > variances should not swing the total job time by an hour. > Perhaps something else is going gone here. Maybe there=C2=B9s a network i= ssue > and Git or one of the maven build steps is taking too long. > The pattern seems to be that the builds are not stuck since they are > aborted at different stages in between jobs. Groovy, agent tests, etc. > On Mar 24, 2015, at 4:07 PM, Jonathan Hurley > > wrote: > No, that change should have no effect on the tests. There were aborted > runs before that change, and there were failed runs after it. It seems > like in some cases, the tests just take too long. > On Mar 24, 2015, at 3:55 PM, Jayush Luniya > > wrote: > This is the change that went in in build#2072. > Jonathan, any change the issue below could have been caused by it? > Sumit, what was the commit version of your change to reenable > TestController tests and when was it committed? > 1. AMBARI-10126 - > Alert Scheduler Is Double Scheduling Jobs (jonathanhurley) (details > ) > Commit 68468feeeeb35ca9edd4899ea8b1abafb7c2742a > b > 35ca9edd4899ea8b1abafb7c2742a> by jhurley > AMBARI-10126 > - Alert Scheduler Is > Double Scheduling Jobs (jonathanhurley) > ambari-agent/src/main/python/ambari_agent/Controller.py > / > src/main/python/ambari_agent/Controller.py&h=3Dbb85337bfdf2404a6aabf78eb3= 61c > 1 > 12f77c977e&hb=3D68468feeeeb35ca9edd4899ea8b1abafb7c2742a> (diff) > g > ent/src/main/python/ambari_agent/Controller.py&fp=3Dambari-agent/src/main= /py > t > hon/ambari_agent/Controller.py&h=3Deeca4c294399e04dae8d893f078d6e6125f3df= 47& > h > p=3Dbb85337bfdf2404a6aabf78eb361c112f77c977e&hb=3D68468feeeeb35ca9edd4899= ea8b1 > a > bafb7c2742a&hpb=3D32e1215639f3cdfea68e2955f316576f1ded85fe> > Thanks > Jayush > On 3/24/15, 12:49 PM, "Sumit Mohanty" > > wrote: > The TestController are the tests I re-enabled to run on mac recently. So > we may see these failures locally as well if your dev box is mac. > ________________________________________ > From: Jayush Luniya > > > Sent: Tuesday, March 24, 2015 12:24 PM > To: Alejandro Fernandez; > dev@ambari.apache.org > Subject: Re: Server unit tests take too long (30+ minutes) > Agreed we should take a look at reducing our test times. > Also, I looked at the latest builds on trunk, looks like there agent > tests are hanging as well leading to builds being aborted. Culprit seems > to be TestController tests. This is not a consistent failure but happens > very frequently since build#2072 > https://builds.apache.org/job/Ambari-trunk-Commit/ > test_repeatRegistration (TestController.TestController) ... ok > test_restartAgent (TestController.TestController) ... ok > test_run (TestController.TestController) ... Build timed out (after 120 > minutes). Marking the build as aborted. > Build was aborted > /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/..= / > ambari-common/src/main/unix/ambari-python-wrap: line 40: 20024 Terminated > $PYTHON "$@" > Thanks > Jayush > From: Alejandro Fernandez > > > Date: Tuesday, March 24, 2015 at 12:18 PM > To: "dev@ambari.apache.org" > > > Cc: Jayush Luniya > > > Subject: Re: Server unit tests take too long (30+ minutes) > +1 to that. > grep -B1 ".*sec$" ~/test_times.txt | sed 's/^.*Time elapsed: \(.*\)$/\1/' > Here's another run with all tests that took over 30 secs. Total time in > these 28 test classes was 28 mins. > The biggest culprit was AmbariManagementControllerTest at 5:28 > Running org.apache.ambari.server.agent.TestHeartbeatHandler > 89.435 sec > Running org.apache.ambari.server.upgrade.UpgradeTest > 76.566 sec > Running > org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationP= r > oviderForDNWithSpaceTest > 55.582 sec > Running org.apache.ambari.server.security.authorization.TestUsers > 43.228 sec > Running > org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationP= r > oviderTest > 57.922 sec > Running > org.apache.ambari.server.controller.internal.StackDefinedPropertyProvider= T > est > 56.585 sec > Running > org.apache.ambari.server.controller.internal.RepositoryVersionResourcePro= v > iderTest > 60.788 sec > Running > org.apache.ambari.server.controller.internal.UpgradeResourceProviderTest > 40.329 sec > Running > org.apache.ambari.server.controller.internal.HostStackVersionResourceProv= i > derTest > 34.812 sec > Running > org.apache.ambari.server.controller.internal.StageResourceProviderTest > 37.434 sec > Running org.apache.ambari.server.controller.AmbariServerTest > 37.638 sec > Running org.apache.ambari.server.controller.AmbariManagementControllerTes= t > 317.327 sec > Running org.apache.ambari.server.actionmanager.TestActionDBAccessorImpl > 53.404 sec > Running org.apache.ambari.server.scheduler.ExecutionScheduleManagerTest > 34.245 sec > Running > org.apache.ambari.server.notifications.dispatchers.SNMPDispatcherTest > 34.732 sec > Running org.apache.ambari.server.state.UpgradeHelperTest > 35.616 sec > Running org.apache.ambari.server.state.alerts.AlertEventPublisherTest > 62.627 sec > Running org.apache.ambari.server.state.alerts.AlertDefinitionHashTest > 42.206 sec > Running org.apache.ambari.server.state.alerts.AlertStateChangedEventTest > 41.462 sec > Running org.apache.ambari.server.state.stack.UpgradePackTest > 72.379 sec > Running org.apache.ambari.server.state.ConfigHelperTest > 72.849 sec > Running > org.apache.ambari.server.state.svccomphost.ServiceComponentHostTest > 50.383 sec > Running org.apache.ambari.server.state.cluster.ClusterTest > 69.889 sec > Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest > 80.271 sec > Running org.apache.ambari.server.state.ServiceTest > 45.443 sec > Running org.apache.ambari.server.orm.dao.AlertsDAOTest > 57.077 sec > Running org.apache.ambari.server.orm.dao.AlertDefinitionDAOTest > 33.872 sec > Running org.apache.ambari.server.metadata.RoleCommandOrderTest > 31.794 sec > Thanks, > Alejandro > On 3/24/15, 11:54 AM, "Jonathan Hurley" > > wrote: > Many of these, such as the deadlock tests and alert tests are just going > to take a long time due to the nature of what they're doing. In general, > if b.a.o is timing out, we need to either increase the timeout for the > job or change our pom.xml to allow for forked execution of the tests. > In my local environment, 3 concurrent forks can run through the test > suite in about 20 minutes. The problem is that both LDAP tests below > always fail in a forked environment. I'd say if we want to get the build > times down, we should look into making the 2 LDAP tests work with forked > test runners in the pom.xml > On Mar 24, 2015, at 2:33 PM, Sumit Mohanty > > wrote: > ?Hi, > these are some of the unit tests that take too long (more than 30 seconds > on my machine). There are several that are above 10 seconds but below 30 > seconds range that can also use some optimization. > Jayush tells me that the Apache builds may be getting aborted as the > build + UT run takes more than an hour. > I will look into some of it when I get a chance. If there are any that > piques your curiosity then take a look. > Running org.apache.ambari.server.agent.TestHeartbeatHandler > Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.43 se= c > Running org.apache.ambari.server.state.cluster.ClusterTest > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.576 > sec > Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.252 se= c > Running org.apache.ambari.server.upgrade.UpgradeTest > Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.433 se= c > Running org.apache.ambari.server.orm.dao.AlertDispatchDAOTest > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.681 > sec > Running org.apache.ambari.server.orm.dao.AlertsDAOTest > Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.474 > sec > Running org.apache.ambari.server.security.authorization.TestUsers > Tests run: 26, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 36.421 > sec > Running > org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationP= r > oviderTest > Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.46 sec > Running > org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationP= r > oviderForDNWithSpaceTest > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.706 se= c > Running org.apache.ambari.server.state.ConfigHelperTest > Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.863 > sec > Running > org.apache.ambari.server.controller.internal.StackDefinedPropertyProvider= T > est > Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.247 > sec > ... > thanks > ?-Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)