cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burwell <>
Subject Re: Let’s discuss database upgrades
Date Mon, 04 Jan 2016 06:04:30 GMT

I completely agree with Wido that the notion of the ACS version (e.g. 4.6.0, 4.6.1, 4.7.0,
etc) should be a purely logical concept.  It points to particular git hash, a version of the
database schema, etc.  I also agree that supporting downgrade is a fools errand as many database
schema changes are destructive.

My largest issue with our upgrade process is that it requires management server downtime (leaving
out the system VM efor the moment).  On a 4-6 month release cycle, a few maintenance windows
a year is livable.  However, monthly maintenance windows becomes more and more onerous.  Consider
the number of maintenance windows AWS opens where all or part of its services are completely
unavailable.  Therefore, our tooling needs to favor additive, non-destructive schema changes
that allow database upgrades to be applied while an older version of the management is still
running.  Combined with a clustered management server configuration, a CloudStack user to
could upgrade their database schema then perform rolling upgrade of their management servers
without taking any downtime.  By policy, we should only perform changes that require downtime
(e.g. changes to the clustering protocol, destructive/offline schema changes, etc) in major
versions (and potentially critical security/stability bug fixes).  While it would be extra
effort, developing a DSL for describing migrations would allow the tool to relieve developers
from worrying destructive/additive changes, as well as, understanding if/when database changes
must be made offline.  Finally, in order to execute upgrades separately from the management
server upgrade, the upgrade tooling and execution must be pulled out of the management server
and into a standalone utility.

Another issue is that we don’t use our database tooling in development in the same way as
users in production.  Anecdotally, developers vary in the way they upgrade their development
databases based on their personal workflow.  We are committing one of the biggest operational
sins — having a special process that is rarely executed.  Therefore, I agree with Daan that
which ever tool we select, it should support a workflow that is efficient for both development
and operations.  We should be eating our own dog food everyday.

I have two issues with FlywayDB.  First, it assumes that wall time will synchronized and monotonically
increasing.  When things get out of order, they tend to fail quietly — causing subtle corruption.
 Second, and most importantly, it assumes a linearly increasing set of releases.  This model
works wonderfully for the deployment of internally developed web applications, but it does
not work for software such as CloudStack.  As has been pointed out, minor revision are released
after minor releases (e.g. 4.5.3 after 4.6.0).  As Daan has also pointed out, in a perfect
world, database changes would only occur in major or minor releases.  However, in reality,
some critical bug fixes require database schema changes.  It it not acceptable to deny users
critical defect fixes because we either cannot or will not make database changes in minor
revisions.  In its current form, FlywayDB is unable to handle this situation because it does
not know how interleave the 4.5.3 schema changes into the 4.6.0 stream (or 4.7.0 or 4.8.0)
since it branched off.

While I think the Chimp proposal could use some refinement, the basic idea of using directed,
acyclic graph (DAG) to establish a chain of database mutations addresses the non-linearizable
nature of our release process.  Essentially, it is borrowing the model established by git
to create a log schema transformations.  Coupling a DAG with content hashing to identify each
change (e.g. the SHA1 of the author, change date, and migration content),  the set of changes
required to transform a schema to another version can be determined at runtime.   Most importantly,
in the same way the git can determine that two revisions cannot be automatically merged, such
a tool can deterministically fail if/when upgrades from one version to the another is not
possible.  To me, a database schema management tool that leveraged the git tree to manage
history and calculate the set of changes required to upgrade from one version to another would
represent the gold standard.  I find this approach so powerful because it would leverage the
standard git revision tracking semantics to identify database changes, the rebase/merge workflow
to identify and resolve schema upgrade conflicts, and release tagging.

In summary, I do not believe that an off-the-shelf tool supports combination of non-linear
upgrade paths and online database migration we require.  Therefore, we will need to develop
tooling of own.  To me, the question is whether building a new tool from scratch or contributing
to existing project represent the shortest path to meeting these requirements.


> On Jan 3, 2016, at 6:07 PM, Rafael Weingärtner <> wrote:
> That is it Ron ;)
> Initially, my intentions were only to change the technology, from a
> homemade approach to an improved one to manage/run upgrade routines to the
> DB. However, after giving some thought to the point you brought up, I think
> that we can use this thread to discuss it too.
> To use Flywaydb as we have been discussing so far, we have to use some
> naming standard as “YYYYMMDDHHmm” and the rules we have stated before. We
> would have to link an ACS version to a marker (timestamp) of the release;
> that could be used to control the upgrade with Flywaydb, since to go from a
> version to another we have to run all of the script in between them; that
> is controlled by the timestamp that would work as an incremental version
> for upgrade routines.
> Additionally, we can have a maven profile to use Flywaydb for devs and a
> Spring bean to manage upgrades in production environments.
> If we have consensus, I am good on adding restrictions regarding the use of
> upgrade routines only on X and X.Y; and not in X.Y.Z to a document that can
> be used to guide devs and committers.
> On Sun, Jan 3, 2016 at 8:16 PM, Ron Wheeler <>
> wrote:
>> On 03/01/2016 7:19 AM, Rafael Weingärtner wrote:
>>> Sorry the delay on answering your inquiries, during this period of New
>>> Year’s Eve I was AFK.
>>> Thanks for the contributions of all.
>>> I will comment your questions and suggestions as follows:
>>> Ron, I understand your point that there are some projects that do not
>>> allow
>>> database change in minor version releases (schema changes). We could
>>> define
>>> that as a standard, I do not see a problem on that, as long as we have
>>> consensus. What we have to keep in mind is that we could still have
>>> scripts
>>> that do not change DB’s schema, but add some table into a table in a minor
>>> version.
>> The main point for me is to make sure that there is a discussion before
>> this happens and that a clear understanding of the technology debt that
>> this creates is taken into account before it happens.
>>> Having said that, we are looking for a way to make the upgrade process
>>> smoother,  looking for a way to avoid creating upgrade path manually with
>>> scripts such as <currentVersion>to<newerVersion>, because that way
we have
>>> to cover every single upgrade path manually. We can work that out using a
>>> tool to “build and execute” the upgrade path, using a standard to create
>>> and name upgrade routines we have been discussing earlier in this thread.
>>> Erik, there is a tool to do that. As I mentioned in my previous emails
>>> there is a tool called Flywaydb that does exactly what you mentioned.
>>> However, that tool will require an improvement in the way we create and
>>> name upgrade routines; those changes have been cited and discussed
>>> earlier.
>>> Paul, about your inquiries:
>>> When you say rollback, do you mean downgrade after an upgrade? If so, we
>>> have discussed that earlier in this thread and we agreed that we would not
>>> cover downgrades, at least for now. The Admin during the upgrade should
>>> properly make a copy of his/her database to be restored if a problem
>>> happens.
>>> About the downtime you mentioned, do you mean the need to stop all of the
>>> MS while executing the upgrade?
>>> As a cloud administrator that is built on top of ACS, I find quite the
>>> opposite of you. If I do not look at the source code, I find the upgrade
>>> procedure pretty easy to follow and execute, giving that we just need to
>>> stop all MS and update it with apt-get.
>>> Even if we build a tool as Rohit suggested, the downtime would exist,
>>> while
>>> upgrading the database old release of MS would have to be stopped,
>>> otherwise we could receive errors with DB’s schemas change. As I said in
>>> some email earlier, I do not find the need to create a new tool that is
>>> just a wrapper. I prefer to define a standard to create and name upgrade
>>> routines and then use a tool such as Flywaydb directly, which would allow
>>> us to manage solely configurations, instead of wrapper code. IMO the less
>>> code the better.
>>> Paul and Remi, now with Remi’s explanation I understand what you meant
>>> with
>>> “downtime”. As Remi’s said the others stack are far worse to upgrade.
>>> OpenStack has a tool such as the suggested “Chimp” that seems to cover
>>> rollbacks. However, I found their upgrade process worse than ours.
>>> We are discussing DB upgrade routines here, I understand the problem of
>>> upgrade as a whole that needs to cover aspect such as SystemVMs upgrade.
>>> However, I think that point should and can be discussed in a separated
>>> thread; as a consequence of that it is a different part of ACS source
>>> code.
>>> About reverting an upgrade, I do not find it hard at all; it is basically
>>> restoring the DBs “cloud” and “cloud_usage” to their state prior the
>>> upgrade (giving that in ACS upgrade page, it is stated that you should
>>> backup your databases). Maybe because I am a developer, I do not see much
>>> problem with that.
>>> Bottom line:
>>> There is a tool that can help us with upgrade routines for DB, what we
>>> need
>>> is a consensus on how to create and name upgrade routines and the tool
>>> that
>>> we can use to build and execute the upgrade path. I think we all agreed
>>> with the standards we had discussed earlier.
>>> Can I create a page in the ACS wiki formalizing the points we discussed
>>> here in regards to ACS DB’s upgrade routines?
>>> I tried to create a child page in
>>>, but it
>>> seems that I do not have permission. After that, I can start working in a
>>> PR to change add flywaydb to ACs.
>>> On Wed, Dec 30, 2015 at 2:41 PM, Ron Wheeler <
>>>> wrote:
>>>> On 30/12/2015 4:58 AM, Remi Bergsma wrote:
>>>> Hoi Paul,
>>>>> Agree that the user perspective is important, thanks for bringing that
>>>>> up.
>>>>> It is also worth pointing out that once you get into the SMB space, the
>>>> system admin may wear a few hats and is not dedicated full time to
>>>> maintaining Cloudstack.
>>>> If it works most of the time the way it is supposed to, the admin is not
>>>> spending any time working with the guts of Cloudstack.
>>>> Once it is up and running, the skills and knowledge will decay pretty
>>>> quickly.
>>>> There is a need for an upgrade that works reliably and has good tests
>>>> that
>>>> can be quickly tried to see that the upgrade has worked or needs to be
>>>> reverted.
>>>> Remember that the other “Stack” is far worse in upgrades, so it’s all
>>>>> about perspective.
>>>>> I guess being the second worst stack is comforting in some way. :-)
>>>>   Having said that, I also want it to be smooth and we absolutely need
>>>>> it
>>>>> to be outside of the main repo and able to rollback if stuff goes wrong
>>>>> (so
>>>>> users can retry).
>>>>> The biggest other issue I see in upgrading is the systemvm replacement
>>>>> and having to reboot (100s or 1000s of routers). That’s where your
>>>>> downtime is most of the time.
>>>>> If you have done all that and have to revert, it is not very comforting
>>>> to
>>>> know that most of the time you wasted was spent in a fairly stable
>>>> process
>>>> and that the downtime can be chalked up to the size of the server
>>>> population. The users will be happy with that, I suppose.
>>>> Although upgrading from 4.6 to 4.7 takes under 5 minutes (stop ACS,
>>>>> replace RPM and start it again) and no systemvm template needed to be
>>>>> replaced. That’s more like it already ;-)
>>>>> That sounds more like what I need!
>>>> Regards,
>>>>> Remi
>>>>> From: Paul Angus <<mailto:
>>>>> Reply-To: "<>"
>>>>> <
>>>>> Date: Wednesday 30 December 2015 10:10
>>>>> To: "<>"
>>>>> Subject: RE: Let’s discuss database upgrades
>>>>> Hi Guys, from the user's perspective, there are two points which come
>>>>> again and again -
>>>>> 1. lack a prescribed roll back if an upgrade goes badly
>>>>> 2. The downtime involved in doing upgrades.
>>>>> - Upgrades are seen as CloudStack's biggest 'issue'.
>>>>> I've had to rescue enough upgrades to understand how complicated it is;
>>>>> however with the increased release velocity, the admin's experience of
>>>>> doing these upgrades needs to be taken into account or we will lose
>>>>> users
>>>>> because of the increased admin overhead and downtime.
>>>>> The purpose of Rohit's CloudChimp was to find a suitable tool/method
>>>>> carry out schema changes *without downtime*. You guys are far better
>>>>> placed
>>>>> to argue the merits of any one solution than me.
>>>>> I would just ask that you keep in mind what the users are looking for
>>>>> relatively clean and recoverable upgrade process.
>>>>> [ShapeBlue]<>
>>>>> Paul Angus
>>>>> VP Technology   ,       ShapeBlue
>>>>> d:      +44 203 617 0528 | s: +44 203 603 0540
>>>>> <tel:+44%20203%20617%200528%20|%20s:%20+44%20203%20603%200540>
>>>>>    |      m:      +44 7711 418784<tel:+44%207711%20418784>
>>>>> e: | t: @cloudyangus<mailto:
>>>>>|%20t:%20@cloudyangus>      |      w:
>>>>> a:      53 Chandos Place, Covent Garden London WC2N 4HS UK
>>>>> [cid:image182380.png@8ca21c21.40847519]
>>>>> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
>>>>> Services India LLP is a company incorporated in India and is operated
>>>>> under
>>>>> license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a
>>>>> company incorporated in Brasil and is operated under license from Shape
>>>>> Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic
>>>>> of
>>>>> South Africa and is traded under license from Shape Blue Ltd. ShapeBlue
>>>>> is
>>>>> a registered trademark.
>>>>> This email and any attachments to it may be confidential and are
>>>>> intended
>>>>> solely for the use of the individual to whom it is addressed. Any views
>>>>> or
>>>>> opinions expressed are solely those of the author and do not necessarily
>>>>> represent those of Shape Blue Ltd or related companies. If you are not
>>>>> the
>>>>> intended recipient of this email, you must neither take any action based
>>>>> upon its contents, nor copy or show it to anyone. Please contact the
>>>>> sender
>>>>> if you believe you have received this email in error.
>>>>> -----Original Message-----
>>>>> From: Erik Weber []
>>>>> Sent: 29 December 2015 21:45
>>>>> To: dev <<>>
>>>>> Subject: Re: Let’s discuss database upgrades
>>>>> On Mon, Dec 28, 2015 at 2:16 PM, Rafael Weingärtner <
>>>>> Hi all devs,
>>>>>> First of all, sorry the long text, but I hope we can start a
>>>>>> discussion here and improve that part of ACS.
>>>>>> A while ago I have faced the code that Apache CloudStack (ACS) uses
>>>>>> upgrade from a version to newer one and that did not seem to be a
>>>>>> way to execute our upgrades. Therefore, I decided to use some time
>>>>>> search for alternatives.
>>>>>> I have read some material about versioning of scripts used to upgrade
>>>>>> a database (DB) of a system and went through some frameworks that
>>>>>> could help us.
>>>>>> In the literature of software engineering, it is firmly stated that
>>>>>> have to version DB scripts as we do with the source code of the
>>>>>> application, using the baseline approach. Gladly, we were not that
>>>>>> at this point, we already versioned our routines for DB upgrade (.sql
>>>>>> and .java). Therefore, it seemed that we just did not have used a
>>>>>> practical approach to help us during DB upgrades.
>>>>>>  From my readings and looking at the ACS source code I raised the
>>>>>> following
>>>>>> requirement:
>>>>>> • We should be able to write more than one routine to upgrade to
>>>>>> version; those routines can be written in Java and SQL. We might
>>>>>> more than a routine to be executed for each version and we should
>>>>>> able to define an order of execution. Additionally, to go to an upper
>>>>>> version, we have to run all of the routines from smaller versions
>>>>>> first, until we achieve the desired version.
>>>>>> We could also add another requirement that is the downgrade from
>>>>>> version, which we currently do not support. With that comes my first
>>>>>> question for
>>>>>> discussion:
>>>>>> • Do we want/need a method to downgrade from a version to a previous
>>>>>> one?
>>>>>> I found an explanation for not supporting downgrades, and I liked
>>>>>> So, what I devised for us:
>>>>>> First the bureaucracy part - our migrations occur basically in three
>>>>>> (3) steps, first we have a "prepare script", then a cleanup script
>>>>>> finally the migration per se that is written in Java, at least, that
>>>>>> is what we can expect when reading the interface
>>>>>> “”.
>>>>>> Additionally, our scripts have the following naming convention:
>>>>>> schema-<currentVersion>to<desiredVersion>, which in IMHO
may cause
>>>>>> some confusion because at first sight we may think that from the
>>>>>> version we could have different paths to an upper version, which
>>>>>> practice is not happening. Instead of a <currentVersion>to<version>
>>>>>> could simply use V_<numberOfVersion>_<sequencial>.<fileExtension>,
>>>>>> giving that, we have to execute all of the V_<version> scripts
>>>>>> are smaller than the version we want to upgrade.
>>>>>> To clarify what I am saying, I will use an example. Let’s say we
>>>>>> just installed ACS and ran the cloudstack-setup-database. That command
>>>>>> will create a database schema in version 4.0.0. To upgrade that schema
>>>>>> to version 4.3.0 (it is just an example, it could be any other
>>>>>> version), ACS will use the following mapping:
>>>>>> _upgradeMap.put("4.0.0", new DbUpgrade[] {new Upgrade40to41(), new
>>>>>> Upgrade410to420(), new Upgrade420to421(), new Upgrade421to430())
>>>>>> After loading the mapping, ACS will execute the scripts defined in
>>>>>> each one of the Upgrade path classes and the migration code per se.
>>>>>> Now, let’s say we change the “.sql” scripts name to the pattern
>>>>>> mentioned, we would have the following scripts; those are the scripts
>>>>>> found that aim to upgrade to versions between the interval 4.0.0
>>>>>> 4.3.0 (considering 4.3.0, since that is the goal version):
>>>>>> - schema-40to410, can be named to: V_410_A.sql
>>>>>> - schema-40to410-cleanup, can be named to: V_410_B.sql
>>>>>> - schema-410to420, can be named to: V_420_A.sql
>>>>>> - schema-410to420-cleanup , can be named to: V_420_b.sql
>>>>>> - schema-420to421, can be named to: V_421_A.sql
>>>>>> - schema-421to430, can be named to: V_430_A.sql
>>>>>> - schema-421to430-cleanup, can be named to: V_430_B.sql
>>>>>> Additionally, all of the java code would have to follow the same
>>>>>> convention. For instance, we have
>>>>>> “”,
>>>>>> which has some java code to migrate from 4.0.0 to 4.1.0. The idea
>>>>>> to extract that migration code to a Java class named:,
>>>>>> giving that it has to execute the SQL scripts before the java code.
>>>>>> In order to go from a smaller version (4.0.0) to an upper one (4.3.0),
>>>>>> we have to run all of the migration routines from intermediate
>>>>>> versions. That is what we are already doing, but we do all of that
>>>>>> manually.
>>>>>> Bottom line, I think we could simple use the convention
>>>>>> V_<numberOfVersion>_<sequencial>.<fileExtension>
to name upgrade
>>>>>> routines.
>>>>>> That would facilitate us to use a framework to help us with that
>>>>>> process.
>>>>>> Additionally, I believe that we should always assume that to go from
>>>>>> smaller version to a higher one, we should run all of the scripts
>>>>>> exist between them. What do you guys think of that?
>>>>>> After the bureaucracy, we can discuss tools. If we use that convention
>>>>>> to name migration (upgrade) routines, we can start thinking on tools
>>>>>> to support our migration process. I found two (2) promising ones:
>>>>>> Liquibase and Flywaydb (both seem to be under Apache license, but
>>>>>> first one has an enterprise version?!). After reading the
>>>>>> documentation and some usage examples I found the flywaydb easier
>>>>>> simpler to use.
>>>>>> What are the options of tools that we can use to help us manage the
>>>>>> database upgrade, without needing to code the upgrade path that you
>>>>>> know?
>>>>>> After that, I think we should decide if we should create another
>>>>>> project/component to take care of migrations, or we can just add
>>>>>> dependency of the tool to a project such as “cloud-framework-db”
>>>>>> start using it.
>>>>>> The “cloud-framework-db” project seems to have a focus on other
>>>>>> such as managing transactions and generating SQLs from annotations
>>>>>> (?!? That should be a topic for another discussion). Therefore, I
>>>>>> would rather create a new project that has the specific goal of
>>>>>> managing ACS DB upgrades. I would also move all of the routines (SQL
>>>>>> and
>>>>>> Java) to this new project.
>>>>>> This project would be a module of the CloudStack project and it would
>>>>>> execute the upgrade routines at the startup of ACS.
>>>>>> I believe that going from a homemade solution to one that is more
>>>>>> consolidated and used by other communities would be the way to go.
>>>>>> I can volunteer myself to create a PR with the aforementioned changes
>>>>>> and using flywaydb to manage our upgrades. However, I prefer to have
>>>>>> good discussion with other devs first, before starting coding.
>>>>>> Do you have suggestions or points that should be raised before we
>>>>>> start working on that?
>>>>>> This isn't my field of work, so forgive me if this is self explanatory
>>>>> or
>>>>> something, but is there no tool like terraform/puppet or similar for
>>>>> database work?
>>>>> I mean, where you state you desired state and the tool handles it.
>>>>> To me it sounds like a good way would be if you could specify what you
>>>>> want to exist (or not), and how it should look like.
>>>>> "I want table XYZ to exist with THESE columns having THIS type(s) and
>>>>> THIS default value bla bla bla"
>>>>> Rather than handling a bunch of sql scripts that has to handle different
>>>>> mysql versions (come to think about an issue with a mariadb version
>>>>> crashing recently), a variety of cloudstack versions and a whole lot
>>>>> more.
>>>>> Disclaimer: i have no idea if this is what flywaydb does, if it is, then
>>>>> just ignore this.
>>>>> --
>>>>> Erik
>>>>> Find out more about ShapeBlue and our range of CloudStack related
>>>>> services:
>>>>> IaaS Cloud Design & Build<
>>>>>> | CSForge –
>>>>> IaaS deployment framework<>
>>>>> CloudStack Consulting<>
>>>>> CloudStack Software Engineering<
>>>>> CloudStack Infrastructure Support<
>>>>>> | CloudStack
>>>>> Bootcamp Training Courses<>
>>>> --
>>>> Ron Wheeler
>>>> President
>>>> Artifact Software Inc
>>>> email:
>>>> skype: ronaldmwheeler
>>>> phone: 866-970-2435, ext 102
>> --
>> Ron Wheeler
>> President
>> Artifact Software Inc
>> email:
>> skype: ronaldmwheeler
>> phone: 866-970-2435, ext 102
> --
> Rafael Weingärtner

Find out more about ShapeBlue and our range of CloudStack related services:
IaaS Cloud Design & Build<> |
CSForge – rapid IaaS deployment framework<>
CloudStack Consulting<> | CloudStack Software
CloudStack Infrastructure Support<>
| CloudStack Bootcamp Training Courses<>
View raw message