cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Simons <LSim...@schubergphilis.com>
Subject Re: [DISCUSSION] CI
Date Fri, 22 Aug 2014 10:51:12 GMT
Hey folks,

Glad to see all the attention on this topic. Let’s do more testing! :-)

CI that we have internally
--------------------------
At schuberg philis we have a “cloudstack integration test” (CIT) environment. It’s grown
and evolved incrementally over time along with our use of cloudstack, so its strongest test
coverage is for the things the folks over here have worked on the most.

The current version consists of an assortment of shell and python scripts that use knife-cloudstack,
various hackery, and various bash and python to bring up a fresh environment, and then some
custom test cases written using the 4.2/4.3 version of marvin to provision a zone and do basic
tests. It’s pretty closely tied to our internal systems (hardcoded template UUIDs are just
the start of it).

On the upside, this stuff does run fully automatically from our internal jenkins. The basic
trick we use is to spin up a pristine environment inside of our existing internal cloudstack-powered
cloud, where we have a dedicated zone for it, backed by some dedicated vmware hypervisors
(since xen doesn’t nest), into which we nest xen server (kvm experiments have never been
completed — it hasn’t been a big priority since we don’t use KVM).

For an impression of what this kind of code looks like, take a look at the workshop scripts
hugo put up at
    https://github.com/spark404/cs-workshop
that’s using cloudmonkey instead of knife, but the basic idea is similar. It’s not that
special, just lots of little bits and gotchas to it. Oh, we do also have an open source version
of “our” cloudstack cookbook
    https://github.com/schubergphilis/cloudstack-cookbook
though I think that’s not quite the same as what we use.

CI changes I’m working on
-------------------------
I’ve been working on and off to reduce schuberg philis specificness of all the scripts so
that we can open source (and contribute here if that makes sense…though if citrix can open
source their internal CI tooling, that might be a much better place to start) the end result.

Starting off from Ian’s GSOC-2014 work, we’re now doing the environment provisioning using
vagrant and vagrant-cloudstack, with various scripts slowly being replaced by chef cookbooks.
This new setup is at the point where we have basic environment provisioning working somewhat
robustly, and its decoupled from most of our expectations. By using vagrant and some parameterization,
we can now _almost_ bring up “a personal cloud” for anyone doing cloudstack development
internally, that is more or less the same as the one used in our CIT environment.

The big lesson learned is that _almost_works_ is miles off from _works reliably enough to
trust the test results_. The nature of the thing being built (and rebuilt) is simply that
most of the time a test failure is caused by some glitch somewhere other than the actual test.
I.e. there might be a failure to apply a port forwarding rule, or several failures followed
by eventual success, but success _after_ the test has timed out, and then the whole thing
topples over and you have to restart a two-hour long build process.

I would love to learn more about how citrix QA handles that kind of thing? I imagine, basically,
with a lot of patience :-)

Anyway, once we have this vagrant-ified stuff up and running for integration needs, in the
coming months, we’ll look at aligning it closer to our production cloud infrastructure (which
is also chef-managed). This is where open source “CIT” would pay off, since I imagine
such a codified how-to-production-cloud is of interest!

Somewhere along the line I am assuming this work can also be fully aligned with broader community
setup/tooling but I can’t see that far ahead yet, especially since I don’t know the shape
of what’s "out there” :-)

Marvin...
---------
Last week I started trying to get some of the existing marvin-based component tests running
on our CIT. Honestly, this has been pretty frustrating and slow-going…I’m not exactly
sure why just yet. Testing with python is supposed to be real fun :-).

I hate pointing at things and saying “I know a better way to do that” without fully understanding
why they are a certain way. I’d prefer to get a larger chunk of the component tests up and
running, first, and blame my own limitations for my frustration until then. So far, I guess
marvin (and/or its usage) just doesn’t feel very pythonic (especially when contrasted with,
say, cloud monkey…was marvin written before there was cloud monkey?). I’m interested in
contributing to improving this, though, when I figure out how!

Regardless of reservations about how convenient or pretty the code is, there’s lots and
lots of component tests that work this way, and a lot of people working on those tests every
day. I think the only realistic approach is to build on top of that existing work — so we
can have _one_ suite of tests, and so that anyone that can work on one test suite can work
on all of them. Even if I’d personally prefer serverspec to envassert :)

Strategy
--------
> Do we have any CI strategy in place or a starting point at least?

Here’s my take.

* I think Ian’s GSOC-2014 vagrant-based devcloud3 (perhaps a better name? ;-)) should become
the starting point for everyone doing local development and testing. The smoke tests should
run and pass against that.

* Automated CI smoke tests should have an environment setup close to / compatible with / reusing
that devcloud3 stuff, to avoid “test works locally, fails on CI”. Having the free TravisCI
approach available to any developer is great. Some kind of configuration management (i.e.
chef) is vital to help spin up resources reliably enough. The jobs being ran should have all
the detail necessary to reproduce them in source control, so that you can figure out how to
replicate a failure without needing access to the CI system(s) that failed.

* I believe the current split between smoke test w/ simulator and component test w/ target
hardware/software is about right. I don’t believe more extensive non-simulator tests with
somehow faked/awkward hardware (i.e. some kind of lightweight docker wrapper to act as a hypervisor)
is actually very useful, since that way you don’t test the component-specific java code,
of which there is a lot.

* The existing marvin-based component test suite runs need to, piece by piece, a step at a
time, migrate out of running inside citrix only, to also running on open/public/accessible
CI resources. The first place to migrate to is probably jenkins.buildacloud.org. I don’t
think TravisCI should be used for this. Apache has buildbot and jenkins, it should be one
of those, and since we have jenkins already… :)

* It’s possible already to contribute slave nodes to the buildacloud.org install that have
access to the resources needed to run the component tests. For example, I hope we can get
(a copy/version of) our own CIT environment hooked up to it at some point, so we can contribute
test results there. How to do this probably needs better documentation (at the moment, I know
that “ask Hugo” works…but I realize not everyone has the luxury of an on-site Hugo!)
and call-to-action.

* buildacloud.org needs to become an ASF-hosted resource. I believe some people are working
on that. I don’t know much about builds.apache.org, but looking at how much that’s already
doing, I imagine merging the two is out of the question. Ideally, I guess, at some point in
the future (2016? :)) the “apache cloudstack jenkins” spins up a cloud into which builds.apache.org
can spin up slaves (and apache buildbot for that matter), and that way the “slave running
hardware” can get pooled.

* All tests _should_ be run for every commit, but there’s no chance of getting those kinds
of resources in place unless the build is refactored into lots of independently testable modules
that you can pinpoint commits to. So, tests will be ran as frequently as available resources
allow.
** I imagine this means doing only 3-10 full CI runs a day. That means picking the branches
to run, i.e. the active trunk (whether you call it master or develop) and active release branches.
** I do imagine there’s room for running a subset of the full suite (i.e. the smoke tests)
regularly (if not _every_ commit) on, say, active feature branches.
** For this branch-selection it would be nice to have a branch naming convention that’s
easily pattern matched and followed by all...

* I’m interested to see whether the marvin tests can be clearly split into distinct suites
that stack on top of each other, i.e. some kind of dependency chain where it is quite clear
that one pack of tests reuses outputs/config results from another pack. The deployDatacenter
module for example, that could be structured as a set of “stage1” test cases. Down the
line I want to be able to take a partially provisioned integration environment, drop a new
RPM into it, and rerun just some of the component tests related to the component I’m working
on — and I want jenkins to do that for me :-)

* Things that can be broken out into components that can be partially integration tested outside
of a “full” environment, should be. For example, we’re putting a framework around the
systemvm so it can be tested without having a running management server. I would like to see
something similar for plugins, so you can have just the plugin + the thing that it plugs in,
and run a suite of tests for just that.


cheers,


Leo


Mime
View raw message