geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Dillon <>
Subject Re: Continuous TCK Testing
Date Mon, 06 Oct 2008 17:18:00 GMT
Sure np, took me a while to get around to writing it too ;-)


On Oct 6, 2008, at 10:24 PM, Jason Warner wrote:

> Just got around to reading this.  Thanks for the brain dump, Jason.   
> No questions as of yet, but I'm sure I'll need a few more reads  
> before I understand it all.
> On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon  
> <> wrote:
> On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:
> Is the GBuild stuff in svn the same as the anthill-based code or is  
> that something different?  GBuild seems to have scripts for running  
> tck and that leads me to think they're the same thing, but I see no  
> mention of anthill in the code.
> The Anthill stuff is completely different than the GBuild stuff.  I  
> started out trying to get the TCK automated using GBuild, but  
> decided that the system lacked too many features to perform as I  
> desired, and went ahead with Anthill as it did pretty much  
> everything, though had some stability problems.
> One of the main reasons why I choose Anthill (AHP, Anthill Pro that  
> is) was its build agent and code repository systems.  This allowed  
> me to ensure that each build used exactly the desired artifacts.   
> Another was the configurable workflow, which allowed me to create a  
> custom chain of events to handle running builds on remote agents and  
> control what data gets set to them, what it will collect and what  
> logic to execute once all distributed work has been completed for a  
> particular build.  And the kicker which help facilitate bringing it  
> all together was its concept of a build life.
> At the time I could find *no other* build tool which could meet all  
> of these needs, and so I went with AHP instead of spending months  
> building/testing features in GBuild.
> While AHP supports configuring a lot of stuff via its web-interface,  
> I found that it was very cumbersome, so I opted to write some glue,  
> which was stored in svn here:
> Its been a while, so I have to refresh my memory on how this stuff  
> actually worked.  First let me explain about the code repository  
> (what it calls codestation) and why it was critical to the TCK  
> testing IMO.  When we use Maven normally, it pulls data from a set  
> of external repositories, picks up more repositories from the stuff  
> it downloads and quickly we loose control where stuff comes from.   
> After it pulls down all that stuff, it churns though a build and  
> spits out the stuff we care about, normally stuffing them (via mvn  
> install) into the local repository.
> AHP supports by default tasks to publish artifacts (really just a  
> set of files controlled by an Ant-like include/exclude path) from a  
> build agent into Codestation, as well as tasks to resolve artifacts  
> (ie. download them from Codestation to the local working directory  
> on the build agents system).  Each top-level build in AHP gets  
> assigned a new (empty) build life.  Artifacts are always published  
> to/resolved from a build life, either that of the current build, or  
> of a dependency build.
> So what I did was I setup builds for Geronimo Server (the normal  
> server/trunk stuff), which did the normal mvn install thingy, but I  
> always gave it a custom -Dmaven.local.repository which resolved to  
> something inside the working directory for the running build.  The  
> build was still online, so it pulled down a bunch of stuff into an  
> empty local repository (so it was a clean build wrt the repository,  
> as well as the source code, which was always fetched for each new  
> build).  Once the build had finished, I used the artifact publisher  
> task to push *all* of the stuff in the local repository into  
> Codestation, labled as something like "Maven repository artifacts"  
> for the current build life.
> Then I setup another build for Apache Geronimo CTS Server (the  
> porting/branches/* stuff).  This build was dependent upon the "Maven  
> repository artifacts" of the Geronimo Server build, and I configured  
> those artifacts to get installed on the build agents system in the  
> same directory that I configured the CTS Server build to use for its  
> local maven repository.  So again the repo started out empty, then  
> got populated with all of the outputs from the normal G build, and  
> then the cts-server build was started.  The build of the components  
> and assemblies is normally fairly quick and aside from some stuff in  
> the private tck repo won't download muck more stuff, because it  
> already had most of its dependencies installed via the Codestation  
> dependency resolution.   Once the build finished, I published to cts- 
> server assembly artifacts back to Codestation under like "CTS Server  
> Assemblies" or something.
> Up until this point its normal builds, but now we have built the G  
> server, then built the CTS server (using the *exact* artifacts from  
> the G server build, even though each might have happened on a  
> different build agent).  And now we need to go and run a bunch of  
> tests, using the *exact* CTS server assemblies, produce some output,  
> collect it, and once all of the tests are done render some nice  
> reports, etc.
> AHP supports setting up builds which contain "parallel" tasks, each  
> of those tasks is then performed by a build agent, they have fancy  
> build agent selection stuff, but for my needs I had basically 2  
> groups, one group for running the server builds, and then another  
> for running the tests.  I only set aside like 2 agents for builds  
> and the rest for tests.  Oh, I forgot to mention that I had 2 16x  
> 16g AMD beasts all running CentOS 5, each with about 10-12 Xen  
> virtual machines running internally to run build agents.  Each  
> system also had a RAID-0 array setup over 4 disks to help reduce  
> disk io wait, which was as I found out the limiting factor when  
> trying to run a ton of builds that all checkout and download  
> artifacts and such.
> I helped the AHP team add a new feature which was an parallel  
> iterator task, so you define *one* task that internally fires off n  
> parallel tasks, which would set the iteration number, and leave it  
> up to the build logic to pick what to do based on that index.  The  
> alternative was a unwieldy set of like 200 tasks in their UI which  
> simply didn't work at all.  You might have notice an  
> "iterations.xml" file in the tck-testsuite directory, this was was  
> was used to take an iteration number and turn it into what tests we  
> actually run.  The <iteration> bits are order sensitive in that file.
> Soooo, after we have a CTS Server for a particular G Server build,  
> we can no go an do "runtests" for a specific set of tests (defined  
> by an iteration)... this differed from the other builds above a  
> little, but still pulled down artifacts, the CTS Server assemblies  
> (only the assemblies and the required bits to run the geronimo-maven- 
> plugin, which was used to geronimo:install, as well as used by the  
> tck itself to fire up the server and so on).  The key thing here,  
> with regards to the maven configuration (besides using that custom  
> Codestation populated repository) was that the builds were run  
> *offline*.
> After runtests completed, the results are then soaked up (the stuff  
> that javatest pukes out with icky details, as well as the full log  
> files and other stuff I can recall) and then pushed back into  
> Codestation.
> Once all of the iterations were finished, another task fires off  
> which generates a report.  It does this by downloading from  
> Codestation all of the runtests outputs (each was zipped I think),  
> unzips them one by one, run some custom goo I wrote (based some of  
> the concepts from original stuff from the GBuild-based TCK  
> automation), and generates a nice Javadoc-like report that includes  
> all of the gory details.
> I can't remember how long I spent working on this... too long (not  
> the reports I mean, the whole system).  But in the end I recall  
> something like running an entire TCK testsuite for a single server  
> configuration (like jetty) in about 4-6 hours... I sent mail to the  
> list with the results, so if you are curious what the real number  
> is, instead of my guess, you can look for it there.  But anyway it  
> was damn quick running on just those 2 machines.  And I *knew*  
> exactly that each of the distributed tests was actually testing a  
> known build that I could trace back to its artifacts and then back  
> to its SVN revision, without worrying about mvn downloading  
> something new when midnight rolled over or that a new G server or  
> CTS server build that might be in progress hasn't compromised the  
> testing by polluting the local repository.
>  * * *
> So, about the sandbox/build-support stuff...
> First there is the 'harness' project, which is rather small, but  
> contains the basic stuff, like a version of ant and maven which all  
> of these builds would use, some other internal glue, a  fix for an  
> evil Maven problem causing erroneous build failures due to some  
> internal thread state corruption or gremlins, not sure which.  I  
> kinda used this project to help manage the software needed by normal  
> builds, which is why Ant and Maven were in there... ie. so I didn't  
> have to go install it on each agent each time it changed, just let  
> the AHP system deal with it for me.
> This was setup as a normal AHP project, built using its internal Ant  
> builder (though having that builder configured still to use the  
> local version it pulled from SVN to ensure it always works.
> Each other build was setup to depend on the output artifacts from  
> the build harness build, using the latest in a range, like say using  
> "3.*" for the latest 3.x build (which looks like that was 3.7).   
> This let me work on new stuff w/o breaking the current builds as I  
> hacked things up.
> So, in addition to all of the stuff I mentioned above wrt the G and  
> CTS builds, each also had this step which resolved the build harness  
> artifacts to that working directory, and the Maven builds were  
> always run via the version of Maven included from the harness.  But,  
> AHP didn't actually run that version of Maven directly, it used its  
> internal Ant task to execute the version of Ant from the harness  
> *and* use the harness.xml buildfile.
> The harness.xml stuff is some more goo which I wrote to help mange  
> AHP configurations.  With AHP (at that time, not sure if it has  
> changed) you had to do most everything via the web UI, which sucked,  
> and it was hard to refactor sets of projects and so on.  So I came  
> up with a standard set of tasks to execute for a project, then put  
> all of the custom muck I needed into what I called a _library_ and  
> then had the AHP via harness.xml invoke it with some configuration  
> about what project it was and other build details.
> The actual harness.xml is not very big, it simply makes sure that */ 
> bin/* is executable (codestation couldn't preserve execute bits),  
> uses the Codestation command-line client (invoking the javaclass  
> directly though) to ask the repository to resolve artifacts from the  
> "Build Library" to the local repository.  I had this artifact  
> resolution separate from the normal dependency (or harness) artifact  
> resolution so that it was easier for me to fix problems with the  
> library while a huge set of TCK iterations were still queued up to  
> run.  Basically, if I noticed a problem due to a code or  
> configuration issue in an early build, I could fix it, and use the  
> existing builds to verify the fix, instead of wasting an hour  
> (sometimes more depending on networking problems accessing remote  
> repos while building the servers) to rebuild and start over.
> This brings us to the 'libraries' project.  In general the idea of a  
> _library_ was just a named/versioned collection of files, where you  
> could be used by a project.  The main (er only) library defined in  
> this SVN is system/.  This is the groovy glue which made everything  
> work.  This is where the entry-point class is located (the guy who  
> gets invoked via harness.xml via:
>    <target name="harness" depends="init">
>        <groovy>
>            <classpath>
>                <pathelement location="${library.basedir}/groovy"/>
>            </classpath>
>            gbuild.system.BuildHarness.bootstrap(this)
>        </groovy>
>    </target>
> I won't go into too much detail on this stuff now, take a look at it  
> and ask questions.  But, basically there is stuff in gbuild.system.*  
> which is harness support muck, and stuff in gbuild.config.* which  
> contains configuration.  I was kinda mid-refactoring of some things,  
> starting to add new features, not sure where I left off actually.  
> But the key bits are in gbuild.config.project.*  This contains a  
> package for each project, with the package name being the same as  
> the AHP project (with " " -> "_"). And then in each of those package  
> is at least a Controller.groovy class (or other classes if special  
> muck was needed, like for the report generation in Geronimo_CTS, etc).
> The controller defines a set of actions, implemented as Groovy  
> closures bound to properties of the Controller class.  One of the  
> properties passed in from the AHP configuration (configured via the  
> Web UI, passed to the harness.xml build, and then on to the Groovy  
> harness) was the name of the _action_ to execute.  Most of that  
> stuff should be fairly straightforward.
> So after a build is started (maybe from a Web UI click, or SVN  
> change detection, or a TCK runtests iteration) the following happens  
> (in simplified terms):
>  * Agent starts build
>  * Agent cleans its working directory
>  * Agent downloads the build harness
>  * Agent downloads any dependencies
>  * Agent invoke Ant on harness.xml passing in some details
>  * Harness.xml downloads the system/1 library
>  * Harness.xml runs gbuild.system.BuildHarness
>  * BuildHarness tries to construct a Controller instance for the  
> project
>  * BuildHarness tries to find Controller action to execute
>  * BuildHarness executes the Controller action
>  * Agent publishes output artifacts
>  * Agent completes build
> A few extra notes on libraries, the JavaEE TCK requires a bunch of  
> stuff we get from Sun to execute.  This stuff isn't small, but is  
> for the most part read-only.  So I setup a location on each build  
> agent where these files were installed to.  I created AHP projects  
> to manage them and treated them like a special "library" one which  
> tried really hard not to go fetch its content unless the local  
> content was out of date.  This helped speed up the entire build  
> process... cause that delete/download of all that muck really slows  
> down 20 agents running in parallel on 2 big machines with stripped  
> array.  For legal reasons this stuff was not kept in  
>'s main repository, and for logistical reasons wasn't  
> kept in the private tck repo on either.  Because  
> there were so many files, and be case the httpd configuration on  
> kicks out requests that it thinks are *bunk* to help  
> save the resources for the community, I had setup a private ssl  
> secured private svn repository on the old machines to put  
> in the full muck required, then setup some goo in the harness to  
> resolve them.  This goo is all in gbuild.system.library.*  See the  
> gbuild.config.projects.Geronimo_CTS.Controller for more of how it  
> was actually used.
>  * * *
> Okay, that is about all the brain-dump for TCK muck I have in me for  
> tonight.  Reply with questions if you have any.
> Cheers,
> --jason
> -- 
> ~Jason Warner

View raw message