geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Dillon <>
Subject Re: Continuous TCK Testing
Date Thu, 09 Oct 2008 16:12:01 GMT
Yup, it was manually installed on each machine ;-)


On Oct 9, 2008, at 6:43 PM, Jason Warner wrote:

> My apologies.  I didn't phrase my question properly.  Most of the  
> software necessary was pulled down via svn, but I saw no such  
> behaviour for AHP.  After looking at it some more, I imagine the  
> software was just manually installed on the machine.  It was kind of  
> a silly question to begin with, I suppose.
> On Thu, Oct 9, 2008 at 4:16 AM, Jason Dillon  
> <> wrote:
> On Oct 8, 2008, at 11:05 PM, Jason Warner wrote:
>> Here's a quick question.  Where does AHP come from?
> (ever heard of google :-P)
> --jason
>> On Mon, Oct 6, 2008 at 1:18 PM, Jason Dillon  
>> <> wrote:
>> Sure np, took me a while to get around to writing it too ;-)
>> --jason
>> On Oct 6, 2008, at 10:24 PM, Jason Warner wrote:
>>> Just got around to reading this.  Thanks for the brain dump,  
>>> Jason.  No questions as of yet, but I'm sure I'll need a few more  
>>> reads before I understand it all.
>>> On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon  
>>> <> wrote:
>>> On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:
>>> Is the GBuild stuff in svn the same as the anthill-based code or  
>>> is that something different?  GBuild seems to have scripts for  
>>> running tck and that leads me to think they're the same thing, but  
>>> I see no mention of anthill in the code.
>>> The Anthill stuff is completely different than the GBuild stuff.   
>>> I started out trying to get the TCK automated using GBuild, but  
>>> decided that the system lacked too many features to perform as I  
>>> desired, and went ahead with Anthill as it did pretty much  
>>> everything, though had some stability problems.
>>> One of the main reasons why I choose Anthill (AHP, Anthill Pro  
>>> that is) was its build agent and code repository systems.  This  
>>> allowed me to ensure that each build used exactly the desired  
>>> artifacts.  Another was the configurable workflow, which allowed  
>>> me to create a custom chain of events to handle running builds on  
>>> remote agents and control what data gets set to them, what it will  
>>> collect and what logic to execute once all distributed work has  
>>> been completed for a particular build.  And the kicker which help  
>>> facilitate bringing it all together was its concept of a build life.
>>> At the time I could find *no other* build tool which could meet  
>>> all of these needs, and so I went with AHP instead of spending  
>>> months building/testing features in GBuild.
>>> While AHP supports configuring a lot of stuff via its web- 
>>> interface, I found that it was very cumbersome, so I opted to  
>>> write some glue, which was stored in svn here:
>>> Its been a while, so I have to refresh my memory on how this stuff  
>>> actually worked.  First let me explain about the code repository  
>>> (what it calls codestation) and why it was critical to the TCK  
>>> testing IMO.  When we use Maven normally, it pulls data from a set  
>>> of external repositories, picks up more repositories from the  
>>> stuff it downloads and quickly we loose control where stuff comes  
>>> from.  After it pulls down all that stuff, it churns though a  
>>> build and spits out the stuff we care about, normally stuffing  
>>> them (via mvn install) into the local repository.
>>> AHP supports by default tasks to publish artifacts (really just a  
>>> set of files controlled by an Ant-like include/exclude path) from  
>>> a build agent into Codestation, as well as tasks to resolve  
>>> artifacts (ie. download them from Codestation to the local working  
>>> directory on the build agents system).  Each top-level build in  
>>> AHP gets assigned a new (empty) build life.  Artifacts are always  
>>> published to/resolved from a build life, either that of the  
>>> current build, or of a dependency build.
>>> So what I did was I setup builds for Geronimo Server (the normal  
>>> server/trunk stuff), which did the normal mvn install thingy, but  
>>> I always gave it a custom -Dmaven.local.repository which resolved  
>>> to something inside the working directory for the running build.   
>>> The build was still online, so it pulled down a bunch of stuff  
>>> into an empty local repository (so it was a clean build wrt the  
>>> repository, as well as the source code, which was always fetched  
>>> for each new build).  Once the build had finished, I used the  
>>> artifact publisher task to push *all* of the stuff in the local  
>>> repository into Codestation, labled as something like "Maven  
>>> repository artifacts" for the current build life.
>>> Then I setup another build for Apache Geronimo CTS Server (the  
>>> porting/branches/* stuff).  This build was dependent upon the  
>>> "Maven repository artifacts" of the Geronimo Server build, and I  
>>> configured those artifacts to get installed on the build agents  
>>> system in the same directory that I configured the CTS Server  
>>> build to use for its local maven repository.  So again the repo  
>>> started out empty, then got populated with all of the outputs from  
>>> the normal G build, and then the cts-server build was started.   
>>> The build of the components and assemblies is normally fairly  
>>> quick and aside from some stuff in the private tck repo won't  
>>> download muck more stuff, because it already had most of its  
>>> dependencies installed via the Codestation dependency  
>>> resolution.   Once the build finished, I published to cts-server  
>>> assembly artifacts back to Codestation under like "CTS Server  
>>> Assemblies" or something.
>>> Up until this point its normal builds, but now we have built the G  
>>> server, then built the CTS server (using the *exact* artifacts  
>>> from the G server build, even though each might have happened on a  
>>> different build agent).  And now we need to go and run a bunch of  
>>> tests, using the *exact* CTS server assemblies, produce some  
>>> output, collect it, and once all of the tests are done render some  
>>> nice reports, etc.
>>> AHP supports setting up builds which contain "parallel" tasks,  
>>> each of those tasks is then performed by a build agent, they have  
>>> fancy build agent selection stuff, but for my needs I had  
>>> basically 2 groups, one group for running the server builds, and  
>>> then another for running the tests.  I only set aside like 2  
>>> agents for builds and the rest for tests.  Oh, I forgot to mention  
>>> that I had 2 16x 16g AMD beasts all running CentOS 5, each with  
>>> about 10-12 Xen virtual machines running internally to run build  
>>> agents.  Each system also had a RAID-0 array setup over 4 disks to  
>>> help reduce disk io wait, which was as I found out the limiting  
>>> factor when trying to run a ton of builds that all checkout and  
>>> download artifacts and such.
>>> I helped the AHP team add a new feature which was an parallel  
>>> iterator task, so you define *one* task that internally fires off  
>>> n parallel tasks, which would set the iteration number, and leave  
>>> it up to the build logic to pick what to do based on that index.   
>>> The alternative was a unwieldy set of like 200 tasks in their UI  
>>> which simply didn't work at all.  You might have notice an  
>>> "iterations.xml" file in the tck-testsuite directory, this was was  
>>> was used to take an iteration number and turn it into what tests  
>>> we actually run.  The <iteration> bits are order sensitive in that  
>>> file.
>>> Soooo, after we have a CTS Server for a particular G Server build,  
>>> we can no go an do "runtests" for a specific set of tests (defined  
>>> by an iteration)... this differed from the other builds above a  
>>> little, but still pulled down artifacts, the CTS Server assemblies  
>>> (only the assemblies and the required bits to run the geronimo- 
>>> maven-plugin, which was used to geronimo:install, as well as used  
>>> by the tck itself to fire up the server and so on).  The key thing  
>>> here, with regards to the maven configuration (besides using that  
>>> custom Codestation populated repository) was that the builds were  
>>> run *offline*.
>>> After runtests completed, the results are then soaked up (the  
>>> stuff that javatest pukes out with icky details, as well as the  
>>> full log files and other stuff I can recall) and then pushed back  
>>> into Codestation.
>>> Once all of the iterations were finished, another task fires off  
>>> which generates a report.  It does this by downloading from  
>>> Codestation all of the runtests outputs (each was zipped I think),  
>>> unzips them one by one, run some custom goo I wrote (based some of  
>>> the concepts from original stuff from the GBuild-based TCK  
>>> automation), and generates a nice Javadoc-like report that  
>>> includes all of the gory details.
>>> I can't remember how long I spent working on this... too long (not  
>>> the reports I mean, the whole system).  But in the end I recall  
>>> something like running an entire TCK testsuite for a single server  
>>> configuration (like jetty) in about 4-6 hours... I sent mail to  
>>> the list with the results, so if you are curious what the real  
>>> number is, instead of my guess, you can look for it there.  But  
>>> anyway it was damn quick running on just those 2 machines.  And I  
>>> *knew* exactly that each of the distributed tests was actually  
>>> testing a known build that I could trace back to its artifacts and  
>>> then back to its SVN revision, without worrying about mvn  
>>> downloading something new when midnight rolled over or that a new  
>>> G server or CTS server build that might be in progress hasn't  
>>> compromised the testing by polluting the local repository.
>>>  * * *
>>> So, about the sandbox/build-support stuff...
>>> First there is the 'harness' project, which is rather small, but  
>>> contains the basic stuff, like a version of ant and maven which  
>>> all of these builds would use, some other internal glue, a  fix  
>>> for an evil Maven problem causing erroneous build failures due to  
>>> some internal thread state corruption or gremlins, not sure  
>>> which.  I kinda used this project to help manage the software  
>>> needed by normal builds, which is why Ant and Maven were in  
>>> there... ie. so I didn't have to go install it on each agent each  
>>> time it changed, just let the AHP system deal with it for me.
>>> This was setup as a normal AHP project, built using its internal  
>>> Ant builder (though having that builder configured still to use  
>>> the local version it pulled from SVN to ensure it always works.
>>> Each other build was setup to depend on the output artifacts from  
>>> the build harness build, using the latest in a range, like say  
>>> using "3.*" for the latest 3.x build (which looks like that was  
>>> 3.7).  This let me work on new stuff w/o breaking the current  
>>> builds as I hacked things up.
>>> So, in addition to all of the stuff I mentioned above wrt the G  
>>> and CTS builds, each also had this step which resolved the build  
>>> harness artifacts to that working directory, and the Maven builds  
>>> were always run via the version of Maven included from the  
>>> harness.  But, AHP didn't actually run that version of Maven  
>>> directly, it used its internal Ant task to execute the version of  
>>> Ant from the harness *and* use the harness.xml buildfile.
>>> The harness.xml stuff is some more goo which I wrote to help mange  
>>> AHP configurations.  With AHP (at that time, not sure if it has  
>>> changed) you had to do most everything via the web UI, which  
>>> sucked, and it was hard to refactor sets of projects and so on.   
>>> So I came up with a standard set of tasks to execute for a  
>>> project, then put all of the custom muck I needed into what I  
>>> called a _library_ and then had the AHP via harness.xml invoke it  
>>> with some configuration about what project it was and other build  
>>> details.
>>> The actual harness.xml is not very big, it simply makes sure that  
>>> */bin/* is executable (codestation couldn't preserve execute  
>>> bits), uses the Codestation command-line client (invoking the  
>>> javaclass directly though) to ask the repository to resolve  
>>> artifacts from the "Build Library" to the local repository.  I had  
>>> this artifact resolution separate from the normal dependency (or  
>>> harness) artifact resolution so that it was easier for me to fix  
>>> problems with the library while a huge set of TCK iterations were  
>>> still queued up to run.  Basically, if I noticed a problem due to  
>>> a code or configuration issue in an early build, I could fix it,  
>>> and use the existing builds to verify the fix, instead of wasting  
>>> an hour (sometimes more depending on networking problems accessing  
>>> remote repos while building the servers) to rebuild and start over.
>>> This brings us to the 'libraries' project.  In general the idea of  
>>> a _library_ was just a named/versioned collection of files, where  
>>> you could be used by a project.  The main (er only) library  
>>> defined in this SVN is system/.  This is the groovy glue which  
>>> made everything work.  This is where the entry-point class is  
>>> located (the guy who gets invoked via harness.xml via:
>>>    <target name="harness" depends="init">
>>>        <groovy>
>>>            <classpath>
>>>                <pathelement location="${library.basedir}/groovy"/>
>>>            </classpath>
>>>            gbuild.system.BuildHarness.bootstrap(this)
>>>        </groovy>
>>>    </target>
>>> I won't go into too much detail on this stuff now, take a look at  
>>> it and ask questions.  But, basically there is stuff in  
>>> gbuild.system.* which is harness support muck, and stuff in  
>>> gbuild.config.* which contains configuration.  I was kinda mid- 
>>> refactoring of some things, starting to add new features, not sure  
>>> where I left off actually. But the key bits are in  
>>> gbuild.config.project.*  This contains a package for each project,  
>>> with the package name being the same as the AHP project (with " " - 
>>> > "_"). And then in each of those package is at least a  
>>> Controller.groovy class (or other classes if special muck was  
>>> needed, like for the report generation in Geronimo_CTS, etc).
>>> The controller defines a set of actions, implemented as Groovy  
>>> closures bound to properties of the Controller class.  One of the  
>>> properties passed in from the AHP configuration (configured via  
>>> the Web UI, passed to the harness.xml build, and then on to the  
>>> Groovy harness) was the name of the _action_ to execute.  Most of  
>>> that stuff should be fairly straightforward.
>>> So after a build is started (maybe from a Web UI click, or SVN  
>>> change detection, or a TCK runtests iteration) the following  
>>> happens (in simplified terms):
>>>  * Agent starts build
>>>  * Agent cleans its working directory
>>>  * Agent downloads the build harness
>>>  * Agent downloads any dependencies
>>>  * Agent invoke Ant on harness.xml passing in some details
>>>  * Harness.xml downloads the system/1 library
>>>  * Harness.xml runs gbuild.system.BuildHarness
>>>  * BuildHarness tries to construct a Controller instance for the  
>>> project
>>>  * BuildHarness tries to find Controller action to execute
>>>  * BuildHarness executes the Controller action
>>>  * Agent publishes output artifacts
>>>  * Agent completes build
>>> A few extra notes on libraries, the JavaEE TCK requires a bunch of  
>>> stuff we get from Sun to execute.  This stuff isn't small, but is  
>>> for the most part read-only.  So I setup a location on each build  
>>> agent where these files were installed to.  I created AHP projects  
>>> to manage them and treated them like a special "library" one which  
>>> tried really hard not to go fetch its content unless the local  
>>> content was out of date.  This helped speed up the entire build  
>>> process... cause that delete/download of all that muck really  
>>> slows down 20 agents running in parallel on 2 big machines with  
>>> stripped array.  For legal reasons this stuff was not kept in  
>>>'s main repository, and for logistical reasons  
>>> wasn't kept in the private tck repo on either.   
>>> Because there were so many files, and be case the httpd  
>>> configuration on kicks out requests that it thinks  
>>> are *bunk* to help save the resources for the community, I had  
>>> setup a private ssl secured private svn repository on the old  
>>> machines to put in the full muck required, then setup  
>>> some goo in the harness to resolve them.  This goo is all in  
>>> gbuild.system.library.*  See the  
>>> gbuild.config.projects.Geronimo_CTS.Controller for more of how it  
>>> was actually used.
>>>  * * *
>>> Okay, that is about all the brain-dump for TCK muck I have in me  
>>> for tonight.  Reply with questions if you have any.
>>> Cheers,
>>> --jason
>>> -- 
>>> ~Jason Warner
>> -- 
>> ~Jason Warner
> -- 
> ~Jason Warner

View raw message