Return-Path: Delivered-To: apmail-geronimo-dev-archive@www.apache.org Received: (qmail 92084 invoked from network); 9 Oct 2008 16:12:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Oct 2008 16:12:55 -0000 Received: (qmail 18746 invoked by uid 500); 9 Oct 2008 16:12:47 -0000 Delivered-To: apmail-geronimo-dev-archive@geronimo.apache.org Received: (qmail 18707 invoked by uid 500); 9 Oct 2008 16:12:47 -0000 Mailing-List: contact dev-help@geronimo.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@geronimo.apache.org List-Id: Delivered-To: mailing list dev@geronimo.apache.org Received: (qmail 18688 invoked by uid 99); 9 Oct 2008 16:12:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2008 09:12:47 -0700 X-ASF-Spam-Status: No, hits=2.1 required=10.0 tests=DNS_FROM_SECURITYSAGE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason.dillon@gmail.com designates 209.85.198.233 as permitted sender) Received: from [209.85.198.233] (HELO rv-out-0506.google.com) (209.85.198.233) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2008 16:11:41 +0000 Received: by rv-out-0506.google.com with SMTP id f6so84616rvb.55 for ; Thu, 09 Oct 2008 09:12:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:mime-version:subject:date:references :x-mailer; bh=jjB5WhsA4sqfBIZMHWXFHmrw8tuRW2rI6WOIBKihZVw=; b=oD01CFzLq3z7XCNCBcZoYe5tIDl0qQespKBYodNvy8wGT1Ydh/aLvmLn4SXypTigw/ R37tFLkIp9VYKiIUwrl8Tr7leI9RaBZ2MPYL4aNIvWFKaoahFFzmstShHf4rwohhPu7P Cekxk+JamFRt5QVe1Tz1U/n0dT6lYh+i/rmvA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type:mime-version:subject :date:references:x-mailer; b=xA2QaqgqZisXA3Rm+m1TjA0n55eTba2x6M7djaQrFuuSSDnUGiHtrPkDHMF07VmkiZ f+cAR306bZcIoqwesGnrs0Z/eYk9Dus/R2XXf48ieYi0pAayRxRCy7cQvzw5s1IXxN0U zLC8Iezql22jFuJv6UEJwLpOagJi7rrbi+nEk= Received: by 10.141.211.5 with SMTP id n5mr253325rvq.115.1223568726593; Thu, 09 Oct 2008 09:12:06 -0700 (PDT) Received: from ?10.0.1.108? (ppp-58-8-240-135.revip2.asianet.co.th [58.8.240.135]) by mx.google.com with ESMTPS id f21sm323947rvb.5.2008.10.09.09.12.03 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 09 Oct 2008 09:12:06 -0700 (PDT) Message-Id: <5923E619-5685-4715-B4E1-AFFDA5EA913B@gmail.com> From: Jason Dillon To: dev@geronimo.apache.org In-Reply-To: <73a75e430810090443x20c21d9ei40ca161aa2501d9c@mail.gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-6-448385458 Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: Continuous TCK Testing Date: Thu, 9 Oct 2008 23:12:01 +0700 References: <48D2D8C8.7020504@gmail.com> <2B643CBC-C567-43FB-B30D-92F0A2D39672@gmail.com> <73a75e430810010920l17659e4fv20c7f4893a49d019@mail.gmail.com> <7EB28779-8BBC-4018-B118-1D2A172E3382@gmail.com> <73a75e430810060824h229a29fdo1b2a4bccbb5e6832@mail.gmail.com> <57939769-81AC-4303-ADE9-818E3B2B8167@gmail.com> <73a75e430810080905i6262e6f3hb2140da6ef52781b@mail.gmail.com> <8535AB49-8267-4255-9AE0-09E6885D65ED@gmail.com> <73a75e430810090443x20c21d9ei40ca161aa2501d9c@mail.gmail.com> X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-6-448385458 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Yup, it was manually installed on each machine ;-) --jason On Oct 9, 2008, at 6:43 PM, Jason Warner wrote: > My apologies. I didn't phrase my question properly. Most of the > software necessary was pulled down via svn, but I saw no such > behaviour for AHP. After looking at it some more, I imagine the > software was just manually installed on the machine. It was kind of > a silly question to begin with, I suppose. > > On Thu, Oct 9, 2008 at 4:16 AM, Jason Dillon > wrote: > On Oct 8, 2008, at 11:05 PM, Jason Warner wrote: >> Here's a quick question. Where does AHP come from? > > http://www.anthillpro.com > > (ever heard of google :-P) > > --jason > > >> >> On Mon, Oct 6, 2008 at 1:18 PM, Jason Dillon >> wrote: >> Sure np, took me a while to get around to writing it too ;-) >> >> --jason >> >> >> On Oct 6, 2008, at 10:24 PM, Jason Warner wrote: >> >>> Just got around to reading this. Thanks for the brain dump, >>> Jason. No questions as of yet, but I'm sure I'll need a few more >>> reads before I understand it all. >>> >>> On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon >>> wrote: >>> On Oct 1, 2008, at 11:20 PM, Jason Warner wrote: >>> >>> Is the GBuild stuff in svn the same as the anthill-based code or >>> is that something different? GBuild seems to have scripts for >>> running tck and that leads me to think they're the same thing, but >>> I see no mention of anthill in the code. >>> >>> The Anthill stuff is completely different than the GBuild stuff. >>> I started out trying to get the TCK automated using GBuild, but >>> decided that the system lacked too many features to perform as I >>> desired, and went ahead with Anthill as it did pretty much >>> everything, though had some stability problems. >>> >>> One of the main reasons why I choose Anthill (AHP, Anthill Pro >>> that is) was its build agent and code repository systems. This >>> allowed me to ensure that each build used exactly the desired >>> artifacts. Another was the configurable workflow, which allowed >>> me to create a custom chain of events to handle running builds on >>> remote agents and control what data gets set to them, what it will >>> collect and what logic to execute once all distributed work has >>> been completed for a particular build. And the kicker which help >>> facilitate bringing it all together was its concept of a build life. >>> >>> At the time I could find *no other* build tool which could meet >>> all of these needs, and so I went with AHP instead of spending >>> months building/testing features in GBuild. >>> >>> While AHP supports configuring a lot of stuff via its web- >>> interface, I found that it was very cumbersome, so I opted to >>> write some glue, which was stored in svn here: >>> >>> https://svn.apache.org/viewvc/geronimo/sandbox/build-support/?pathrev=632245 >>> >>> Its been a while, so I have to refresh my memory on how this stuff >>> actually worked. First let me explain about the code repository >>> (what it calls codestation) and why it was critical to the TCK >>> testing IMO. When we use Maven normally, it pulls data from a set >>> of external repositories, picks up more repositories from the >>> stuff it downloads and quickly we loose control where stuff comes >>> from. After it pulls down all that stuff, it churns though a >>> build and spits out the stuff we care about, normally stuffing >>> them (via mvn install) into the local repository. >>> >>> AHP supports by default tasks to publish artifacts (really just a >>> set of files controlled by an Ant-like include/exclude path) from >>> a build agent into Codestation, as well as tasks to resolve >>> artifacts (ie. download them from Codestation to the local working >>> directory on the build agents system). Each top-level build in >>> AHP gets assigned a new (empty) build life. Artifacts are always >>> published to/resolved from a build life, either that of the >>> current build, or of a dependency build. >>> >>> So what I did was I setup builds for Geronimo Server (the normal >>> server/trunk stuff), which did the normal mvn install thingy, but >>> I always gave it a custom -Dmaven.local.repository which resolved >>> to something inside the working directory for the running build. >>> The build was still online, so it pulled down a bunch of stuff >>> into an empty local repository (so it was a clean build wrt the >>> repository, as well as the source code, which was always fetched >>> for each new build). Once the build had finished, I used the >>> artifact publisher task to push *all* of the stuff in the local >>> repository into Codestation, labled as something like "Maven >>> repository artifacts" for the current build life. >>> >>> Then I setup another build for Apache Geronimo CTS Server (the >>> porting/branches/* stuff). This build was dependent upon the >>> "Maven repository artifacts" of the Geronimo Server build, and I >>> configured those artifacts to get installed on the build agents >>> system in the same directory that I configured the CTS Server >>> build to use for its local maven repository. So again the repo >>> started out empty, then got populated with all of the outputs from >>> the normal G build, and then the cts-server build was started. >>> The build of the components and assemblies is normally fairly >>> quick and aside from some stuff in the private tck repo won't >>> download muck more stuff, because it already had most of its >>> dependencies installed via the Codestation dependency >>> resolution. Once the build finished, I published to cts-server >>> assembly artifacts back to Codestation under like "CTS Server >>> Assemblies" or something. >>> >>> Up until this point its normal builds, but now we have built the G >>> server, then built the CTS server (using the *exact* artifacts >>> from the G server build, even though each might have happened on a >>> different build agent). And now we need to go and run a bunch of >>> tests, using the *exact* CTS server assemblies, produce some >>> output, collect it, and once all of the tests are done render some >>> nice reports, etc. >>> >>> AHP supports setting up builds which contain "parallel" tasks, >>> each of those tasks is then performed by a build agent, they have >>> fancy build agent selection stuff, but for my needs I had >>> basically 2 groups, one group for running the server builds, and >>> then another for running the tests. I only set aside like 2 >>> agents for builds and the rest for tests. Oh, I forgot to mention >>> that I had 2 16x 16g AMD beasts all running CentOS 5, each with >>> about 10-12 Xen virtual machines running internally to run build >>> agents. Each system also had a RAID-0 array setup over 4 disks to >>> help reduce disk io wait, which was as I found out the limiting >>> factor when trying to run a ton of builds that all checkout and >>> download artifacts and such. >>> >>> I helped the AHP team add a new feature which was an parallel >>> iterator task, so you define *one* task that internally fires off >>> n parallel tasks, which would set the iteration number, and leave >>> it up to the build logic to pick what to do based on that index. >>> The alternative was a unwieldy set of like 200 tasks in their UI >>> which simply didn't work at all. You might have notice an >>> "iterations.xml" file in the tck-testsuite directory, this was was >>> was used to take an iteration number and turn it into what tests >>> we actually run. The bits are order sensitive in that >>> file. >>> >>> Soooo, after we have a CTS Server for a particular G Server build, >>> we can no go an do "runtests" for a specific set of tests (defined >>> by an iteration)... this differed from the other builds above a >>> little, but still pulled down artifacts, the CTS Server assemblies >>> (only the assemblies and the required bits to run the geronimo- >>> maven-plugin, which was used to geronimo:install, as well as used >>> by the tck itself to fire up the server and so on). The key thing >>> here, with regards to the maven configuration (besides using that >>> custom Codestation populated repository) was that the builds were >>> run *offline*. >>> >>> After runtests completed, the results are then soaked up (the >>> stuff that javatest pukes out with icky details, as well as the >>> full log files and other stuff I can recall) and then pushed back >>> into Codestation. >>> >>> Once all of the iterations were finished, another task fires off >>> which generates a report. It does this by downloading from >>> Codestation all of the runtests outputs (each was zipped I think), >>> unzips them one by one, run some custom goo I wrote (based some of >>> the concepts from original stuff from the GBuild-based TCK >>> automation), and generates a nice Javadoc-like report that >>> includes all of the gory details. >>> >>> I can't remember how long I spent working on this... too long (not >>> the reports I mean, the whole system). But in the end I recall >>> something like running an entire TCK testsuite for a single server >>> configuration (like jetty) in about 4-6 hours... I sent mail to >>> the list with the results, so if you are curious what the real >>> number is, instead of my guess, you can look for it there. But >>> anyway it was damn quick running on just those 2 machines. And I >>> *knew* exactly that each of the distributed tests was actually >>> testing a known build that I could trace back to its artifacts and >>> then back to its SVN revision, without worrying about mvn >>> downloading something new when midnight rolled over or that a new >>> G server or CTS server build that might be in progress hasn't >>> compromised the testing by polluting the local repository. >>> >>> * * * >>> >>> So, about the sandbox/build-support stuff... >>> >>> First there is the 'harness' project, which is rather small, but >>> contains the basic stuff, like a version of ant and maven which >>> all of these builds would use, some other internal glue, a fix >>> for an evil Maven problem causing erroneous build failures due to >>> some internal thread state corruption or gremlins, not sure >>> which. I kinda used this project to help manage the software >>> needed by normal builds, which is why Ant and Maven were in >>> there... ie. so I didn't have to go install it on each agent each >>> time it changed, just let the AHP system deal with it for me. >>> >>> This was setup as a normal AHP project, built using its internal >>> Ant builder (though having that builder configured still to use >>> the local version it pulled from SVN to ensure it always works. >>> >>> Each other build was setup to depend on the output artifacts from >>> the build harness build, using the latest in a range, like say >>> using "3.*" for the latest 3.x build (which looks like that was >>> 3.7). This let me work on new stuff w/o breaking the current >>> builds as I hacked things up. >>> >>> So, in addition to all of the stuff I mentioned above wrt the G >>> and CTS builds, each also had this step which resolved the build >>> harness artifacts to that working directory, and the Maven builds >>> were always run via the version of Maven included from the >>> harness. But, AHP didn't actually run that version of Maven >>> directly, it used its internal Ant task to execute the version of >>> Ant from the harness *and* use the harness.xml buildfile. >>> >>> The harness.xml stuff is some more goo which I wrote to help mange >>> AHP configurations. With AHP (at that time, not sure if it has >>> changed) you had to do most everything via the web UI, which >>> sucked, and it was hard to refactor sets of projects and so on. >>> So I came up with a standard set of tasks to execute for a >>> project, then put all of the custom muck I needed into what I >>> called a _library_ and then had the AHP via harness.xml invoke it >>> with some configuration about what project it was and other build >>> details. >>> >>> The actual harness.xml is not very big, it simply makes sure that >>> */bin/* is executable (codestation couldn't preserve execute >>> bits), uses the Codestation command-line client (invoking the >>> javaclass directly though) to ask the repository to resolve >>> artifacts from the "Build Library" to the local repository. I had >>> this artifact resolution separate from the normal dependency (or >>> harness) artifact resolution so that it was easier for me to fix >>> problems with the library while a huge set of TCK iterations were >>> still queued up to run. Basically, if I noticed a problem due to >>> a code or configuration issue in an early build, I could fix it, >>> and use the existing builds to verify the fix, instead of wasting >>> an hour (sometimes more depending on networking problems accessing >>> remote repos while building the servers) to rebuild and start over. >>> >>> This brings us to the 'libraries' project. In general the idea of >>> a _library_ was just a named/versioned collection of files, where >>> you could be used by a project. The main (er only) library >>> defined in this SVN is system/. This is the groovy glue which >>> made everything work. This is where the entry-point class is >>> located (the guy who gets invoked via harness.xml via: >>> >>> >>> >>> >>> >>> >>> >>> gbuild.system.BuildHarness.bootstrap(this) >>> >>> >>> >>> I won't go into too much detail on this stuff now, take a look at >>> it and ask questions. But, basically there is stuff in >>> gbuild.system.* which is harness support muck, and stuff in >>> gbuild.config.* which contains configuration. I was kinda mid- >>> refactoring of some things, starting to add new features, not sure >>> where I left off actually. But the key bits are in >>> gbuild.config.project.* This contains a package for each project, >>> with the package name being the same as the AHP project (with " " - >>> > "_"). And then in each of those package is at least a >>> Controller.groovy class (or other classes if special muck was >>> needed, like for the report generation in Geronimo_CTS, etc). >>> >>> The controller defines a set of actions, implemented as Groovy >>> closures bound to properties of the Controller class. One of the >>> properties passed in from the AHP configuration (configured via >>> the Web UI, passed to the harness.xml build, and then on to the >>> Groovy harness) was the name of the _action_ to execute. Most of >>> that stuff should be fairly straightforward. >>> >>> So after a build is started (maybe from a Web UI click, or SVN >>> change detection, or a TCK runtests iteration) the following >>> happens (in simplified terms): >>> >>> * Agent starts build >>> * Agent cleans its working directory >>> * Agent downloads the build harness >>> * Agent downloads any dependencies >>> * Agent invoke Ant on harness.xml passing in some details >>> * Harness.xml downloads the system/1 library >>> * Harness.xml runs gbuild.system.BuildHarness >>> * BuildHarness tries to construct a Controller instance for the >>> project >>> * BuildHarness tries to find Controller action to execute >>> * BuildHarness executes the Controller action >>> * Agent publishes output artifacts >>> * Agent completes build >>> >>> A few extra notes on libraries, the JavaEE TCK requires a bunch of >>> stuff we get from Sun to execute. This stuff isn't small, but is >>> for the most part read-only. So I setup a location on each build >>> agent where these files were installed to. I created AHP projects >>> to manage them and treated them like a special "library" one which >>> tried really hard not to go fetch its content unless the local >>> content was out of date. This helped speed up the entire build >>> process... cause that delete/download of all that muck really >>> slows down 20 agents running in parallel on 2 big machines with >>> stripped array. For legal reasons this stuff was not kept in >>> svn.apache.org's main repository, and for logistical reasons >>> wasn't kept in the private tck repo on svn.apache.org either. >>> Because there were so many files, and be case the httpd >>> configuration on svn.apache.org kicks out requests that it thinks >>> are *bunk* to help save the resources for the community, I had >>> setup a private ssl secured private svn repository on the old >>> gbuild.org machines to put in the full muck required, then setup >>> some goo in the harness to resolve them. This goo is all in >>> gbuild.system.library.* See the >>> gbuild.config.projects.Geronimo_CTS.Controller for more of how it >>> was actually used. >>> >>> * * * >>> >>> Okay, that is about all the brain-dump for TCK muck I have in me >>> for tonight. Reply with questions if you have any. >>> >>> Cheers, >>> >>> --jason >>> >>> >>> >>> >>> >>> -- >>> ~Jason Warner >> >> >> >> >> -- >> ~Jason Warner > > > > > -- > ~Jason Warner --Apple-Mail-6-448385458 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Yup, it was manually installed = on each machine = ;-)

--jason


On = Oct 9, 2008, at 6:43 PM, Jason Warner wrote:

My apologies.  I didn't phrase my question = properly.  Most of the software necessary was pulled down via svn, = but I saw no such behaviour for AHP.  After looking at it some = more, I imagine the software was just manually installed on the = machine.  It was kind of a silly question to begin with, I = suppose.

On Thu, Oct 9, 2008 at 4:16 = AM, Jason Dillon <jason.dillon@gmail.com> = wrote:
On Oct 8, 2008, = at 11:05 PM, Jason Warner wrote:
Here's a = quick question.  Where does AHP come from? 
=

http://www.anthillpro.com

(= ever heard of google :-P)

--jason



On Mon, Oct 6, 2008 at 1:18 = PM, Jason Dillon <jason.dillon@gmail.com> wrote:
=
Sure np, took me a while to get around to writing it too = ;-)

= --jason


On Oct 6, 2008, at 10:24 PM, Jason Warner = wrote:

Just got = around to reading this.  Thanks for the brain dump, Jason.  No = questions as of yet, but I'm sure I'll need a few more reads before I = understand it all. 

On Thu, Oct = 2, 2008 at 2:34 PM, Jason Dillon <jason.dillon@gmail.com> wrote:
=
=
On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:

=
Is = the GBuild stuff in svn the same as the anthill-based code or is that = something different?  GBuild seems to have scripts for running tck = and that leads me to think they're the same thing, but I see no mention = of anthill in the code.

The Anthill stuff = is completely different than the GBuild stuff.  I started out = trying to get the TCK automated using GBuild, but decided that the = system lacked too many features to perform as I desired, and went ahead = with Anthill as it did pretty much everything, though had some stability = problems.

One of the main reasons why I choose Anthill (AHP, = Anthill Pro that is) was its build agent and code repository systems. =  This allowed me to ensure that each build used exactly the desired = artifacts.  Another was the configurable workflow, which allowed me = to create a custom chain of events to handle running builds on remote = agents and control what data gets set to them, what it will collect and = what logic to execute once all distributed work has been completed for a = particular build.  And the kicker which help facilitate bringing it = all together was its concept of a build life.

At the time I = could find *no other* build tool which could meet all of these needs, = and so I went with AHP instead of spending months building/testing = features in GBuild.

While AHP supports configuring a lot of = stuff via its web-interface, I found that it was very cumbersome, so I = opted to write some glue, which was stored in svn here:

  =  https://svn.apache.org/viewvc/geronimo/sandbox/build-sup= port/?pathrev=3D632245

Its been a while, so I have to = refresh my memory on how this stuff actually worked.  First let me = explain about the code repository (what it calls codestation) and why it = was critical to the TCK testing IMO.  When we use Maven normally, = it pulls data from a set of external repositories, picks up more = repositories from the stuff it downloads and quickly we loose control = where stuff comes from.  After it pulls down all that stuff, it = churns though a build and spits out the stuff we care about, normally = stuffing them (via mvn install) into the local repository.

AHP = supports by default tasks to publish artifacts (really just a set of = files controlled by an Ant-like include/exclude path) from a build agent = into Codestation, as well as tasks to resolve artifacts (ie. download = them from Codestation to the local working directory on the build agents = system).  Each top-level build in AHP gets assigned a new (empty) = build life.  Artifacts are always published to/resolved from a = build life, either that of the current build, or of a dependency = build.

So what I did was I setup builds for Geronimo Server = (the normal server/trunk stuff), which did the normal mvn install = thingy, but I always gave it a custom -Dmaven.local.repository which = resolved to something inside the working directory for the running = build.  The build was still online, so it pulled down a bunch of = stuff into an empty local repository (so it was a clean build wrt the = repository, as well as the source code, which was always fetched for = each new build).  Once the build had finished, I used the artifact = publisher task to push *all* of the stuff in the local repository into = Codestation, labled as something like "Maven repository artifacts" for = the current build life.

Then I setup another build for Apache = Geronimo CTS Server (the porting/branches/* stuff).  This build was = dependent upon the "Maven repository artifacts" of the Geronimo Server = build, and I configured those artifacts to get installed on the build = agents system in the same directory that I configured the CTS Server = build to use for its local maven repository.  So again the repo = started out empty, then got populated with all of the outputs from the = normal G build, and then the cts-server build was started.  The = build of the components and assemblies is normally fairly quick and = aside from some stuff in the private tck repo won't download muck more = stuff, because it already had most of its dependencies installed via the = Codestation dependency resolution.   Once the build finished, I = published to cts-server assembly artifacts back to Codestation under = like "CTS Server Assemblies" or something.

Up until this point = its normal builds, but now we have built the G server, then built the = CTS server (using the *exact* artifacts from the G server build, even = though each might have happened on a different build agent).  And = now we need to go and run a bunch of tests, using the *exact* CTS server = assemblies, produce some output, collect it, and once all of the tests = are done render some nice reports, etc.

AHP supports setting up = builds which contain "parallel" tasks, each of those tasks is then = performed by a build agent, they have fancy build agent selection stuff, = but for my needs I had basically 2 groups, one group for running the = server builds, and then another for running the tests.  I only set = aside like 2 agents for builds and the rest for tests.  Oh, I = forgot to mention that I had 2 16x 16g AMD beasts all running CentOS 5, = each with about 10-12 Xen virtual machines running internally to run = build agents.  Each system also had a RAID-0 array setup over 4 = disks to help reduce disk io wait, which was as I found out the limiting = factor when trying to run a ton of builds that all checkout and download = artifacts and such.

I helped the AHP team add a new feature = which was an parallel iterator task, so you define *one* task that = internally fires off n parallel tasks, which would set the iteration = number, and leave it up to the build logic to pick what to do based on = that index.  The alternative was a unwieldy set of like 200 tasks = in their UI which simply didn't work at all.  You might have notice = an "iterations.xml" file in the tck-testsuite directory, this was was = was used to take an iteration number and turn it into what tests we = actually run.  The <iteration> bits are order sensitive in that = file.

Soooo, after we have a CTS Server for a particular G = Server build, we can no go an do "runtests" for a specific set of tests = (defined by an iteration)... this differed from the other builds above a = little, but still pulled down artifacts, the CTS Server assemblies (only = the assemblies and the required bits to run the geronimo-maven-plugin, = which was used to geronimo:install, as well as used by the tck itself to = fire up the server and so on).  The key thing here, with regards to = the maven configuration (besides using that custom Codestation populated = repository) was that the builds were run *offline*.

After = runtests completed, the results are then soaked up (the stuff that = javatest pukes out with icky details, as well as the full log files and = other stuff I can recall) and then pushed back into Codestation.
=
Once all of the iterations were finished, another task fires off = which generates a report.  It does this by downloading from = Codestation all of the runtests outputs (each was zipped I think), = unzips them one by one, run some custom goo I wrote (based some of the = concepts from original stuff from the GBuild-based TCK automation), and = generates a nice Javadoc-like report that includes all of the gory = details.

I can't remember how long I spent working on this... = too long (not the reports I mean, the whole system).  But in the = end I recall something like running an entire TCK testsuite for a single = server configuration (like jetty) in about 4-6 hours... I sent mail to = the list with the results, so if you are curious what the real number = is, instead of my guess, you can look for it there.  But anyway it = was damn quick running on just those 2 machines.  And I *knew* = exactly that each of the distributed tests was actually testing a known = build that I could trace back to its artifacts and then back to its SVN = revision, without worrying about mvn downloading something new when = midnight rolled over or that a new G server or CTS server build that = might be in progress hasn't compromised the testing by polluting the = local repository.

 * * *

So, about the = sandbox/build-support stuff...

First there is the 'harness' = project, which is rather small, but contains the basic stuff, like a = version of ant and maven which all of these builds would use, some other = internal glue, a  fix for an evil Maven problem causing erroneous = build failures due to some internal thread state corruption or gremlins, = not sure which.  I kinda used this project to help manage the = software needed by normal builds, which is why Ant and Maven were in = there... ie. so I didn't have to go install it on each agent each time = it changed, just let the AHP system deal with it for me.

This = was setup as a normal AHP project, built using its internal Ant builder = (though having that builder configured still to use the local version it = pulled from SVN to ensure it always works.

Each other build was = setup to depend on the output artifacts from the build harness build, = using the latest in a range, like say using "3.*" for the latest 3.x = build (which looks like that was 3.7).  This let me work on new = stuff w/o breaking the current builds as I hacked things up.

= So, in addition to all of the stuff I mentioned above wrt the G and CTS = builds, each also had this step which resolved the build harness = artifacts to that working directory, and the Maven builds were always = run via the version of Maven included from the harness.  But, AHP = didn't actually run that version of Maven directly, it used its internal = Ant task to execute the version of Ant from the harness *and* use the = harness.xml buildfile.

The harness.xml stuff is some more goo = which I wrote to help mange AHP configurations.  With AHP (at that = time, not sure if it has changed) you had to do most everything via the = web UI, which sucked, and it was hard to refactor sets of projects and = so on.  So I came up with a standard set of tasks to execute for a = project, then put all of the custom muck I needed into what I called a = _library_ and then had the AHP via harness.xml invoke it with some = configuration about what project it was and other build details.
=
The actual harness.xml is not very big, it simply makes sure that = */bin/* is executable (codestation couldn't preserve execute bits), uses = the Codestation command-line client (invoking the javaclass directly = though) to ask the repository to resolve artifacts from the "Build = Library" to the local repository.  I had this artifact resolution = separate from the normal dependency (or harness) artifact resolution so = that it was easier for me to fix problems with the library while a huge = set of TCK iterations were still queued up to run.  Basically, if I = noticed a problem due to a code or configuration issue in an early = build, I could fix it, and use the existing builds to verify the fix, = instead of wasting an hour (sometimes more depending on networking = problems accessing remote repos while building the servers) to rebuild = and start over.

This brings us to the 'libraries' project. =  In general the idea of a _library_ was just a named/versioned = collection of files, where you could be used by a project.  The = main (er only) library defined in this SVN is system/.  This is the = groovy glue which made everything work.  This is where the = entry-point class is located (the guy who gets invoked via harness.xml = via:

   <target name=3D"harness" = depends=3D"init">
       <groovy>
  =          <classpath>
    =            <pathelement = location=3D"${library.basedir}/groovy"/>
        =    </classpath>

          =  gbuild.system.BuildHarness.bootstrap(this)
    =    </groovy>
   </target>

I won't = go into too much detail on this stuff now, take a look at it and ask = questions.  But, basically there is stuff in gbuild.system.* which = is harness support muck, and stuff in gbuild.config.* which contains = configuration.  I was kinda mid-refactoring of some things, = starting to add new features, not sure where I left off actually. But = the key bits are in gbuild.config.project.*  This contains a = package for each project, with the package name being the same as the = AHP project (with " " -> "_"). And then in each of those package is at = least a Controller.groovy class (or other classes if special muck was = needed, like for the report generation in Geronimo_CTS, etc).

= The controller defines a set of actions, implemented as Groovy closures = bound to properties of the Controller class.  One of the properties = passed in from the AHP configuration (configured via the Web UI, passed = to the harness.xml build, and then on to the Groovy harness) was the = name of the _action_ to execute.  Most of that stuff should be = fairly straightforward.

So after a build is started (maybe from = a Web UI click, or SVN change detection, or a TCK runtests iteration) = the following happens (in simplified terms):

 * Agent = starts build
 * Agent cleans its working directory
 * = Agent downloads the build harness
 * Agent downloads any = dependencies
 * Agent invoke Ant on harness.xml passing in some = details
 * Harness.xml downloads the system/1 library
=  * Harness.xml runs gbuild.system.BuildHarness
 * = BuildHarness tries to construct a Controller instance for the = project
 * BuildHarness tries to find Controller action to = execute
 * BuildHarness executes the Controller action
=  * Agent publishes output artifacts
 * Agent completes = build

A few extra notes on libraries, the JavaEE TCK requires a = bunch of stuff we get from Sun to execute.  This stuff isn't small, = but is for the most part read-only.  So I setup a location on each = build agent where these files were installed to.  I created AHP = projects to manage them and treated them like a special "library" one = which tried really hard not to go fetch its content unless the local = content was out of date.  This helped speed up the entire build = process... cause that delete/download of all that muck really slows down = 20 agents running in parallel on 2 big machines with stripped array. =  For legal reasons this stuff was not kept in svn.apache.org's = main repository, and for logistical reasons wasn't kept in the private = tck repo on svn.apache.org either.  Because there were so = many files, and be case the httpd configuration on svn.apache.org = kicks out requests that it thinks are *bunk* to help save the resources = for the community, I had setup a private ssl secured private svn = repository on the old gbuild.org machines to put in the full muck = required, then setup some goo in the harness to resolve them.  This = goo is all in gbuild.system.library.*  See the = gbuild.config.projects.Geronimo_CTS.Controller for more of how it was = actually used.

 * * *

Okay, that is about all the = brain-dump for TCK muck I have in me for tonight.  Reply with = questions if you have any.

Cheers,
=
--jason





--
~Jason Warner
=



--
~Jason Warner
=

=


--
~Jason Warner
=
= --Apple-Mail-6-448385458--