geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Warner" <>
Subject Re: Continuous TCK Testing
Date Thu, 09 Oct 2008 11:43:33 GMT
My apologies.  I didn't phrase my question properly.  Most of the software
necessary was pulled down via svn, but I saw no such behaviour for AHP.
After looking at it some more, I imagine the software was just manually
installed on the machine.  It was kind of a silly question to begin with, I

On Thu, Oct 9, 2008 at 4:16 AM, Jason Dillon <> wrote:

> On Oct 8, 2008, at 11:05 PM, Jason Warner wrote:
> Here's a quick question.  Where does AHP come from?
> (ever heard of google :-P)
> --jason
> On Mon, Oct 6, 2008 at 1:18 PM, Jason Dillon <>wrote:
>> Sure np, took me a while to get around to writing it too ;-)
>> --jason
>> On Oct 6, 2008, at 10:24 PM, Jason Warner wrote:
>> Just got around to reading this.  Thanks for the brain dump, Jason.  No
>> questions as of yet, but I'm sure I'll need a few more reads before I
>> understand it all.
>> On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon <>wrote:
>>> On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:
>>>  Is the GBuild stuff in svn the same as the anthill-based code or is that
>>>> something different?  GBuild seems to have scripts for running tck and that
>>>> leads me to think they're the same thing, but I see no mention of anthill
>>>> the code.
>>> The Anthill stuff is completely different than the GBuild stuff.  I
>>> started out trying to get the TCK automated using GBuild, but decided that
>>> the system lacked too many features to perform as I desired, and went ahead
>>> with Anthill as it did pretty much everything, though had some stability
>>> problems.
>>> One of the main reasons why I choose Anthill (AHP, Anthill Pro that is)
>>> was its build agent and code repository systems.  This allowed me to ensure
>>> that each build used exactly the desired artifacts.  Another was the
>>> configurable workflow, which allowed me to create a custom chain of events
>>> to handle running builds on remote agents and control what data gets set to
>>> them, what it will collect and what logic to execute once all distributed
>>> work has been completed for a particular build.  And the kicker which help
>>> facilitate bringing it all together was its concept of a build life.
>>> At the time I could find *no other* build tool which could meet all of
>>> these needs, and so I went with AHP instead of spending months
>>> building/testing features in GBuild.
>>> While AHP supports configuring a lot of stuff via its web-interface, I
>>> found that it was very cumbersome, so I opted to write some glue, which was
>>> stored in svn here:
>>> Its been a while, so I have to refresh my memory on how this stuff
>>> actually worked.  First let me explain about the code repository (what it
>>> calls codestation) and why it was critical to the TCK testing IMO.  When we
>>> use Maven normally, it pulls data from a set of external repositories, picks
>>> up more repositories from the stuff it downloads and quickly we loose
>>> control where stuff comes from.  After it pulls down all that stuff, it
>>> churns though a build and spits out the stuff we care about, normally
>>> stuffing them (via mvn install) into the local repository.
>>> AHP supports by default tasks to publish artifacts (really just a set of
>>> files controlled by an Ant-like include/exclude path) from a build agent
>>> into Codestation, as well as tasks to resolve artifacts (ie. download them
>>> from Codestation to the local working directory on the build agents system).
>>>  Each top-level build in AHP gets assigned a new (empty) build life.
>>>  Artifacts are always published to/resolved from a build life, either that
>>> of the current build, or of a dependency build.
>>> So what I did was I setup builds for Geronimo Server (the normal
>>> server/trunk stuff), which did the normal mvn install thingy, but I always
>>> gave it a custom -Dmaven.local.repository which resolved to something inside
>>> the working directory for the running build.  The build was still online, so
>>> it pulled down a bunch of stuff into an empty local repository (so it was a
>>> clean build wrt the repository, as well as the source code, which was always
>>> fetched for each new build).  Once the build had finished, I used the
>>> artifact publisher task to push *all* of the stuff in the local repository
>>> into Codestation, labled as something like "Maven repository artifacts" for
>>> the current build life.
>>> Then I setup another build for Apache Geronimo CTS Server (the
>>> porting/branches/* stuff).  This build was dependent upon the "Maven
>>> repository artifacts" of the Geronimo Server build, and I configured those
>>> artifacts to get installed on the build agents system in the same directory
>>> that I configured the CTS Server build to use for its local maven
>>> repository.  So again the repo started out empty, then got populated with
>>> all of the outputs from the normal G build, and then the cts-server build
>>> was started.  The build of the components and assemblies is normally fairly
>>> quick and aside from some stuff in the private tck repo won't download muck
>>> more stuff, because it already had most of its dependencies installed via
>>> the Codestation dependency resolution.   Once the build finished, I
>>> published to cts-server assembly artifacts back to Codestation under like
>>> "CTS Server Assemblies" or something.
>>> Up until this point its normal builds, but now we have built the G
>>> server, then built the CTS server (using the *exact* artifacts from the G
>>> server build, even though each might have happened on a different build
>>> agent).  And now we need to go and run a bunch of tests, using the *exact*
>>> CTS server assemblies, produce some output, collect it, and once all of the
>>> tests are done render some nice reports, etc.
>>> AHP supports setting up builds which contain "parallel" tasks, each of
>>> those tasks is then performed by a build agent, they have fancy build agent
>>> selection stuff, but for my needs I had basically 2 groups, one group for
>>> running the server builds, and then another for running the tests.  I only
>>> set aside like 2 agents for builds and the rest for tests.  Oh, I forgot to
>>> mention that I had 2 16x 16g AMD beasts all running CentOS 5, each with
>>> about 10-12 Xen virtual machines running internally to run build agents.
>>>  Each system also had a RAID-0 array setup over 4 disks to help reduce disk
>>> io wait, which was as I found out the limiting factor when trying to run a
>>> ton of builds that all checkout and download artifacts and such.
>>> I helped the AHP team add a new feature which was an parallel iterator
>>> task, so you define *one* task that internally fires off n parallel tasks,
>>> which would set the iteration number, and leave it up to the build logic to
>>> pick what to do based on that index.  The alternative was a unwieldy set of
>>> like 200 tasks in their UI which simply didn't work at all.  You might have
>>> notice an "iterations.xml" file in the tck-testsuite directory, this was was
>>> was used to take an iteration number and turn it into what tests we actually
>>> run.  The <iteration> bits are order sensitive in that file.
>>> Soooo, after we have a CTS Server for a particular G Server build, we can
>>> no go an do "runtests" for a specific set of tests (defined by an
>>> iteration)... this differed from the other builds above a little, but still
>>> pulled down artifacts, the CTS Server assemblies (only the assemblies and
>>> the required bits to run the geronimo-maven-plugin, which was used to
>>> geronimo:install, as well as used by the tck itself to fire up the server
>>> and so on).  The key thing here, with regards to the maven configuration
>>> (besides using that custom Codestation populated repository) was that the
>>> builds were run *offline*.
>>> After runtests completed, the results are then soaked up (the stuff that
>>> javatest pukes out with icky details, as well as the full log files and
>>> other stuff I can recall) and then pushed back into Codestation.
>>> Once all of the iterations were finished, another task fires off which
>>> generates a report.  It does this by downloading from Codestation all of the
>>> runtests outputs (each was zipped I think), unzips them one by one, run some
>>> custom goo I wrote (based some of the concepts from original stuff from the
>>> GBuild-based TCK automation), and generates a nice Javadoc-like report that
>>> includes all of the gory details.
>>> I can't remember how long I spent working on this... too long (not the
>>> reports I mean, the whole system).  But in the end I recall something like
>>> running an entire TCK testsuite for a single server configuration (like
>>> jetty) in about 4-6 hours... I sent mail to the list with the results, so if
>>> you are curious what the real number is, instead of my guess, you can look
>>> for it there.  But anyway it was damn quick running on just those 2
>>> machines.  And I *knew* exactly that each of the distributed tests was
>>> actually testing a known build that I could trace back to its artifacts and
>>> then back to its SVN revision, without worrying about mvn downloading
>>> something new when midnight rolled over or that a new G server or CTS server
>>> build that might be in progress hasn't compromised the testing by polluting
>>> the local repository.
>>>  * * *
>>> So, about the sandbox/build-support stuff...
>>> First there is the 'harness' project, which is rather small, but contains
>>> the basic stuff, like a version of ant and maven which all of these builds
>>> would use, some other internal glue, a  fix for an evil Maven problem
>>> causing erroneous build failures due to some internal thread state
>>> corruption or gremlins, not sure which.  I kinda used this project to help
>>> manage the software needed by normal builds, which is why Ant and Maven were
>>> in there... ie. so I didn't have to go install it on each agent each time it
>>> changed, just let the AHP system deal with it for me.
>>> This was setup as a normal AHP project, built using its internal Ant
>>> builder (though having that builder configured still to use the local
>>> version it pulled from SVN to ensure it always works.
>>> Each other build was setup to depend on the output artifacts from the
>>> build harness build, using the latest in a range, like say using "3.*" for
>>> the latest 3.x build (which looks like that was 3.7).  This let me work on
>>> new stuff w/o breaking the current builds as I hacked things up.
>>> So, in addition to all of the stuff I mentioned above wrt the G and CTS
>>> builds, each also had this step which resolved the build harness artifacts
>>> to that working directory, and the Maven builds were always run via the
>>> version of Maven included from the harness.  But, AHP didn't actually run
>>> that version of Maven directly, it used its internal Ant task to execute the
>>> version of Ant from the harness *and* use the harness.xml buildfile.
>>> The harness.xml stuff is some more goo which I wrote to help mange AHP
>>> configurations.  With AHP (at that time, not sure if it has changed) you had
>>> to do most everything via the web UI, which sucked, and it was hard to
>>> refactor sets of projects and so on.  So I came up with a standard set of
>>> tasks to execute for a project, then put all of the custom muck I needed
>>> into what I called a _library_ and then had the AHP via harness.xml invoke
>>> it with some configuration about what project it was and other build
>>> details.
>>> The actual harness.xml is not very big, it simply makes sure that */bin/*
>>> is executable (codestation couldn't preserve execute bits), uses the
>>> Codestation command-line client (invoking the javaclass directly though) to
>>> ask the repository to resolve artifacts from the "Build Library" to the
>>> local repository.  I had this artifact resolution separate from the normal
>>> dependency (or harness) artifact resolution so that it was easier for me to
>>> fix problems with the library while a huge set of TCK iterations were still
>>> queued up to run.  Basically, if I noticed a problem due to a code or
>>> configuration issue in an early build, I could fix it, and use the existing
>>> builds to verify the fix, instead of wasting an hour (sometimes more
>>> depending on networking problems accessing remote repos while building the
>>> servers) to rebuild and start over.
>>> This brings us to the 'libraries' project.  In general the idea of a
>>> _library_ was just a named/versioned collection of files, where you could be
>>> used by a project.  The main (er only) library defined in this SVN is
>>> system/.  This is the groovy glue which made everything work.  This is where
>>> the entry-point class is located (the guy who gets invoked via harness.xml
>>> via:
>>>    <target name="harness" depends="init">
>>>        <groovy>
>>>            <classpath>
>>>                <pathelement location="${library.basedir}/groovy"/>
>>>            </classpath>
>>>            gbuild.system.BuildHarness.bootstrap(this)
>>>        </groovy>
>>>    </target>
>>> I won't go into too much detail on this stuff now, take a look at it and
>>> ask questions.  But, basically there is stuff in gbuild.system.* which is
>>> harness support muck, and stuff in gbuild.config.* which contains
>>> configuration.  I was kinda mid-refactoring of some things, starting to add
>>> new features, not sure where I left off actually. But the key bits are in
>>> gbuild.config.project.*  This contains a package for each project, with the
>>> package name being the same as the AHP project (with " " -> "_"). And then
>>> in each of those package is at least a Controller.groovy class (or other
>>> classes if special muck was needed, like for the report generation in
>>> Geronimo_CTS, etc).
>>> The controller defines a set of actions, implemented as Groovy closures
>>> bound to properties of the Controller class.  One of the properties passed
>>> in from the AHP configuration (configured via the Web UI, passed to the
>>> harness.xml build, and then on to the Groovy harness) was the name of the
>>> _action_ to execute.  Most of that stuff should be fairly straightforward.
>>> So after a build is started (maybe from a Web UI click, or SVN change
>>> detection, or a TCK runtests iteration) the following happens (in simplified
>>> terms):
>>>  * Agent starts build
>>>  * Agent cleans its working directory
>>>  * Agent downloads the build harness
>>>  * Agent downloads any dependencies
>>>  * Agent invoke Ant on harness.xml passing in some details
>>>  * Harness.xml downloads the system/1 library
>>>  * Harness.xml runs gbuild.system.BuildHarness
>>>  * BuildHarness tries to construct a Controller instance for the project
>>>  * BuildHarness tries to find Controller action to execute
>>>  * BuildHarness executes the Controller action
>>>  * Agent publishes output artifacts
>>>  * Agent completes build
>>> A few extra notes on libraries, the JavaEE TCK requires a bunch of stuff
>>> we get from Sun to execute.  This stuff isn't small, but is for the most
>>> part read-only.  So I setup a location on each build agent where these files
>>> were installed to.  I created AHP projects to manage them and treated them
>>> like a special "library" one which tried really hard not to go fetch its
>>> content unless the local content was out of date.  This helped speed up the
>>> entire build process... cause that delete/download of all that muck really
>>> slows down 20 agents running in parallel on 2 big machines with stripped
>>> array.  For legal reasons this stuff was not kept in's
>>> main repository, and for logistical reasons wasn't kept in the private tck
>>> repo on either.  Because there were so many files, and be
>>> case the httpd configuration on kicks out requests that
>>> it thinks are *bunk* to help save the resources for the community, I had
>>> setup a private ssl secured private svn repository on the old gbuild.orgmachines
to put in the full muck required, then setup some goo in the
>>> harness to resolve them.  This goo is all in gbuild.system.library.*  See
>>> the gbuild.config.projects.Geronimo_CTS.Controller for more of how it was
>>> actually used.
>>>  * * *
>>> Okay, that is about all the brain-dump for TCK muck I have in me for
>>> tonight.  Reply with questions if you have any.
>>> Cheers,
>>> --jason
>> --
>> ~Jason Warner
> --
> ~Jason Warner

~Jason Warner

View raw message