On Oct 8, 2008, at 11:05 PM, Jason Warner wrote:
> Here's a quick question. Where does AHP come from?
http://www.anthillpro.com
(ever heard of google :-P)
--jason
>
> On Mon, Oct 6, 2008 at 1:18 PM, Jason Dillon
> <jason.dillon@gmail.com> wrote:
> Sure np, took me a while to get around to writing it too ;-)
>
> --jason
>
>
> On Oct 6, 2008, at 10:24 PM, Jason Warner wrote:
>
>> Just got around to reading this. Thanks for the brain dump,
>> Jason. No questions as of yet, but I'm sure I'll need a few more
>> reads before I understand it all.
>>
>> On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon
>> <jason.dillon@gmail.com> wrote:
>> On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:
>>
>> Is the GBuild stuff in svn the same as the anthill-based code or is
>> that something different? GBuild seems to have scripts for running
>> tck and that leads me to think they're the same thing, but I see no
>> mention of anthill in the code.
>>
>> The Anthill stuff is completely different than the GBuild stuff. I
>> started out trying to get the TCK automated using GBuild, but
>> decided that the system lacked too many features to perform as I
>> desired, and went ahead with Anthill as it did pretty much
>> everything, though had some stability problems.
>>
>> One of the main reasons why I choose Anthill (AHP, Anthill Pro that
>> is) was its build agent and code repository systems. This allowed
>> me to ensure that each build used exactly the desired artifacts.
>> Another was the configurable workflow, which allowed me to create a
>> custom chain of events to handle running builds on remote agents
>> and control what data gets set to them, what it will collect and
>> what logic to execute once all distributed work has been completed
>> for a particular build. And the kicker which help facilitate
>> bringing it all together was its concept of a build life.
>>
>> At the time I could find *no other* build tool which could meet all
>> of these needs, and so I went with AHP instead of spending months
>> building/testing features in GBuild.
>>
>> While AHP supports configuring a lot of stuff via its web-
>> interface, I found that it was very cumbersome, so I opted to write
>> some glue, which was stored in svn here:
>>
>> https://svn.apache.org/viewvc/geronimo/sandbox/build-support/?pathrev=632245
>>
>> Its been a while, so I have to refresh my memory on how this stuff
>> actually worked. First let me explain about the code repository
>> (what it calls codestation) and why it was critical to the TCK
>> testing IMO. When we use Maven normally, it pulls data from a set
>> of external repositories, picks up more repositories from the stuff
>> it downloads and quickly we loose control where stuff comes from.
>> After it pulls down all that stuff, it churns though a build and
>> spits out the stuff we care about, normally stuffing them (via mvn
>> install) into the local repository.
>>
>> AHP supports by default tasks to publish artifacts (really just a
>> set of files controlled by an Ant-like include/exclude path) from a
>> build agent into Codestation, as well as tasks to resolve artifacts
>> (ie. download them from Codestation to the local working directory
>> on the build agents system). Each top-level build in AHP gets
>> assigned a new (empty) build life. Artifacts are always published
>> to/resolved from a build life, either that of the current build, or
>> of a dependency build.
>>
>> So what I did was I setup builds for Geronimo Server (the normal
>> server/trunk stuff), which did the normal mvn install thingy, but I
>> always gave it a custom -Dmaven.local.repository which resolved to
>> something inside the working directory for the running build. The
>> build was still online, so it pulled down a bunch of stuff into an
>> empty local repository (so it was a clean build wrt the repository,
>> as well as the source code, which was always fetched for each new
>> build). Once the build had finished, I used the artifact publisher
>> task to push *all* of the stuff in the local repository into
>> Codestation, labled as something like "Maven repository artifacts"
>> for the current build life.
>>
>> Then I setup another build for Apache Geronimo CTS Server (the
>> porting/branches/* stuff). This build was dependent upon the
>> "Maven repository artifacts" of the Geronimo Server build, and I
>> configured those artifacts to get installed on the build agents
>> system in the same directory that I configured the CTS Server build
>> to use for its local maven repository. So again the repo started
>> out empty, then got populated with all of the outputs from the
>> normal G build, and then the cts-server build was started. The
>> build of the components and assemblies is normally fairly quick and
>> aside from some stuff in the private tck repo won't download muck
>> more stuff, because it already had most of its dependencies
>> installed via the Codestation dependency resolution. Once the
>> build finished, I published to cts-server assembly artifacts back
>> to Codestation under like "CTS Server Assemblies" or something.
>>
>> Up until this point its normal builds, but now we have built the G
>> server, then built the CTS server (using the *exact* artifacts from
>> the G server build, even though each might have happened on a
>> different build agent). And now we need to go and run a bunch of
>> tests, using the *exact* CTS server assemblies, produce some
>> output, collect it, and once all of the tests are done render some
>> nice reports, etc.
>>
>> AHP supports setting up builds which contain "parallel" tasks, each
>> of those tasks is then performed by a build agent, they have fancy
>> build agent selection stuff, but for my needs I had basically 2
>> groups, one group for running the server builds, and then another
>> for running the tests. I only set aside like 2 agents for builds
>> and the rest for tests. Oh, I forgot to mention that I had 2 16x
>> 16g AMD beasts all running CentOS 5, each with about 10-12 Xen
>> virtual machines running internally to run build agents. Each
>> system also had a RAID-0 array setup over 4 disks to help reduce
>> disk io wait, which was as I found out the limiting factor when
>> trying to run a ton of builds that all checkout and download
>> artifacts and such.
>>
>> I helped the AHP team add a new feature which was an parallel
>> iterator task, so you define *one* task that internally fires off n
>> parallel tasks, which would set the iteration number, and leave it
>> up to the build logic to pick what to do based on that index. The
>> alternative was a unwieldy set of like 200 tasks in their UI which
>> simply didn't work at all. You might have notice an
>> "iterations.xml" file in the tck-testsuite directory, this was was
>> was used to take an iteration number and turn it into what tests we
>> actually run. The <iteration> bits are order sensitive in that file.
>>
>> Soooo, after we have a CTS Server for a particular G Server build,
>> we can no go an do "runtests" for a specific set of tests (defined
>> by an iteration)... this differed from the other builds above a
>> little, but still pulled down artifacts, the CTS Server assemblies
>> (only the assemblies and the required bits to run the geronimo-
>> maven-plugin, which was used to geronimo:install, as well as used
>> by the tck itself to fire up the server and so on). The key thing
>> here, with regards to the maven configuration (besides using that
>> custom Codestation populated repository) was that the builds were
>> run *offline*.
>>
>> After runtests completed, the results are then soaked up (the stuff
>> that javatest pukes out with icky details, as well as the full log
>> files and other stuff I can recall) and then pushed back into
>> Codestation.
>>
>> Once all of the iterations were finished, another task fires off
>> which generates a report. It does this by downloading from
>> Codestation all of the runtests outputs (each was zipped I think),
>> unzips them one by one, run some custom goo I wrote (based some of
>> the concepts from original stuff from the GBuild-based TCK
>> automation), and generates a nice Javadoc-like report that includes
>> all of the gory details.
>>
>> I can't remember how long I spent working on this... too long (not
>> the reports I mean, the whole system). But in the end I recall
>> something like running an entire TCK testsuite for a single server
>> configuration (like jetty) in about 4-6 hours... I sent mail to the
>> list with the results, so if you are curious what the real number
>> is, instead of my guess, you can look for it there. But anyway it
>> was damn quick running on just those 2 machines. And I *knew*
>> exactly that each of the distributed tests was actually testing a
>> known build that I could trace back to its artifacts and then back
>> to its SVN revision, without worrying about mvn downloading
>> something new when midnight rolled over or that a new G server or
>> CTS server build that might be in progress hasn't compromised the
>> testing by polluting the local repository.
>>
>> * * *
>>
>> So, about the sandbox/build-support stuff...
>>
>> First there is the 'harness' project, which is rather small, but
>> contains the basic stuff, like a version of ant and maven which all
>> of these builds would use, some other internal glue, a fix for an
>> evil Maven problem causing erroneous build failures due to some
>> internal thread state corruption or gremlins, not sure which. I
>> kinda used this project to help manage the software needed by
>> normal builds, which is why Ant and Maven were in there... ie. so I
>> didn't have to go install it on each agent each time it changed,
>> just let the AHP system deal with it for me.
>>
>> This was setup as a normal AHP project, built using its internal
>> Ant builder (though having that builder configured still to use the
>> local version it pulled from SVN to ensure it always works.
>>
>> Each other build was setup to depend on the output artifacts from
>> the build harness build, using the latest in a range, like say
>> using "3.*" for the latest 3.x build (which looks like that was
>> 3.7). This let me work on new stuff w/o breaking the current
>> builds as I hacked things up.
>>
>> So, in addition to all of the stuff I mentioned above wrt the G and
>> CTS builds, each also had this step which resolved the build
>> harness artifacts to that working directory, and the Maven builds
>> were always run via the version of Maven included from the
>> harness. But, AHP didn't actually run that version of Maven
>> directly, it used its internal Ant task to execute the version of
>> Ant from the harness *and* use the harness.xml buildfile.
>>
>> The harness.xml stuff is some more goo which I wrote to help mange
>> AHP configurations. With AHP (at that time, not sure if it has
>> changed) you had to do most everything via the web UI, which
>> sucked, and it was hard to refactor sets of projects and so on. So
>> I came up with a standard set of tasks to execute for a project,
>> then put all of the custom muck I needed into what I called a
>> _library_ and then had the AHP via harness.xml invoke it with some
>> configuration about what project it was and other build details.
>>
>> The actual harness.xml is not very big, it simply makes sure that */
>> bin/* is executable (codestation couldn't preserve execute bits),
>> uses the Codestation command-line client (invoking the javaclass
>> directly though) to ask the repository to resolve artifacts from
>> the "Build Library" to the local repository. I had this artifact
>> resolution separate from the normal dependency (or harness)
>> artifact resolution so that it was easier for me to fix problems
>> with the library while a huge set of TCK iterations were still
>> queued up to run. Basically, if I noticed a problem due to a code
>> or configuration issue in an early build, I could fix it, and use
>> the existing builds to verify the fix, instead of wasting an hour
>> (sometimes more depending on networking problems accessing remote
>> repos while building the servers) to rebuild and start over.
>>
>> This brings us to the 'libraries' project. In general the idea of
>> a _library_ was just a named/versioned collection of files, where
>> you could be used by a project. The main (er only) library defined
>> in this SVN is system/. This is the groovy glue which made
>> everything work. This is where the entry-point class is located
>> (the guy who gets invoked via harness.xml via:
>>
>> <target name="harness" depends="init">
>> <groovy>
>> <classpath>
>> <pathelement location="${library.basedir}/groovy"/>
>> </classpath>
>>
>> gbuild.system.BuildHarness.bootstrap(this)
>> </groovy>
>> </target>
>>
>> I won't go into too much detail on this stuff now, take a look at
>> it and ask questions. But, basically there is stuff in
>> gbuild.system.* which is harness support muck, and stuff in
>> gbuild.config.* which contains configuration. I was kinda mid-
>> refactoring of some things, starting to add new features, not sure
>> where I left off actually. But the key bits are in
>> gbuild.config.project.* This contains a package for each project,
>> with the package name being the same as the AHP project (with " " -
>> > "_"). And then in each of those package is at least a
>> Controller.groovy class (or other classes if special muck was
>> needed, like for the report generation in Geronimo_CTS, etc).
>>
>> The controller defines a set of actions, implemented as Groovy
>> closures bound to properties of the Controller class. One of the
>> properties passed in from the AHP configuration (configured via the
>> Web UI, passed to the harness.xml build, and then on to the Groovy
>> harness) was the name of the _action_ to execute. Most of that
>> stuff should be fairly straightforward.
>>
>> So after a build is started (maybe from a Web UI click, or SVN
>> change detection, or a TCK runtests iteration) the following
>> happens (in simplified terms):
>>
>> * Agent starts build
>> * Agent cleans its working directory
>> * Agent downloads the build harness
>> * Agent downloads any dependencies
>> * Agent invoke Ant on harness.xml passing in some details
>> * Harness.xml downloads the system/1 library
>> * Harness.xml runs gbuild.system.BuildHarness
>> * BuildHarness tries to construct a Controller instance for the
>> project
>> * BuildHarness tries to find Controller action to execute
>> * BuildHarness executes the Controller action
>> * Agent publishes output artifacts
>> * Agent completes build
>>
>> A few extra notes on libraries, the JavaEE TCK requires a bunch of
>> stuff we get from Sun to execute. This stuff isn't small, but is
>> for the most part read-only. So I setup a location on each build
>> agent where these files were installed to. I created AHP projects
>> to manage them and treated them like a special "library" one which
>> tried really hard not to go fetch its content unless the local
>> content was out of date. This helped speed up the entire build
>> process... cause that delete/download of all that muck really slows
>> down 20 agents running in parallel on 2 big machines with stripped
>> array. For legal reasons this stuff was not kept in
>> svn.apache.org's main repository, and for logistical reasons wasn't
>> kept in the private tck repo on svn.apache.org either. Because
>> there were so many files, and be case the httpd configuration on
>> svn.apache.org kicks out requests that it thinks are *bunk* to help
>> save the resources for the community, I had setup a private ssl
>> secured private svn repository on the old gbuild.org machines to
>> put in the full muck required, then setup some goo in the harness
>> to resolve them. This goo is all in gbuild.system.library.* See
>> the gbuild.config.projects.Geronimo_CTS.Controller for more of how
>> it was actually used.
>>
>> * * *
>>
>> Okay, that is about all the brain-dump for TCK muck I have in me
>> for tonight. Reply with questions if you have any.
>>
>> Cheers,
>>
>> --jason
>>
>>
>>
>>
>>
>> --
>> ~Jason Warner
>
>
>
>
> --
> ~Jason Warner
|