cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tutkowski, Mike" <Mike.Tutkow...@netapp.com>
Subject Re: [Proposal] - StorageHA
Date Sat, 11 Mar 2017 06:04:21 GMT
Hi,

Thanks for sending out this email and welcome to the CloudStack Community. :)

I have a couple quick questions:

First of all, let me start with something I found in our docs:
Primary Storage Outage and Data Loss<http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/4.8/reliability.html#primary-storage-outage-and-data-loss>

When a primary storage outage occurs the hypervisor immediately stops all VMs stored on that
storage device. Guests that are marked for HA will be restarted as soon as practical when
the primary storage comes back on line. With NFS, the hypervisor may allow the virtual machines
to continue running depending on the nature of the issue. For example, an NFS hang will cause
the guest VMs to be suspended until storage connectivity is restored. Primary storage is not
designed to be backed up. Individual volumes in primary storage can be backed up using snapshots.

What I was curious about is if you plan to exclusively build your feature as a set of scripts
and/or if you plan to update the CloudStack code base, as well.

Also, if a primary storage actually goes offline, I'm not clear on how starting an impacted
VM on a different compute host would help. Could you clarify this for me?

Thanks!
Mike

On Mar 10, 2017, at 8:29 AM, Jeromy Grimmett <jeromy@cloudbrix.com<mailto:jeromy@cloudbrix.com>>
wrote:

Hello,

I am new to the mailing list, and we are glad to be a part of the CloudStack community.  We
are looking to develop plugins and modules that will help grow and expand the adoption and
use of CloudStack.  So as part of my introductory email, I’d like to introduce a little
project we have been working on; a StorageHA Monitor.  The Monitor would allow CloudStack
and the hosts to test, communicate and resolve VM availability issues when storage (primary
and/or secondary) availability becomes apparent.  This is a small write up about how it would
work:

Consists of two scripts/programs:

The host script runs on the host servers and checks to see if the primary and secondary storage
is available by doing a read/write test then reports to the master script that runs on the
Cloudstack server. The host script will test a read and a write to the storage every 5 seconds
(configurable), and if it fails 3 times (configurable) then it will be recorded by the master
script.

The master script will monitor the results of the host script. If the test is good, nothing
happens and the results are logged and so that we can track the history of the test results.
If the test reports back as failed, then it will perform the following actions:


·         Secondary Storage - It will simply generate and send an alert that the failure
has occurred.


·         Primary Storage - The script will perform the following tasks:

o   Generate and send an alert that the failure has occurred.

o   Force the VMs on that host to shutdown.

o   Determine which host to move the VMs to.

o   Start the VMs on the healthy host.

We have already started working on some code, and the solution seems to be testing well. 
Any thoughts/ideas/input are(is) welcome.  Should there are a solution out there already,
then please forgive our ignorance, and point us in the right direction. We look forward to
further collaboration with you all.

Regards,
j

Jeromy Grimmett
[cb-sig-logo2]
155 Fleet Street
Portsmouth, NH 03801
Direct: 603.766.3625
Office: 603.766.4908
Fax: 603.766.4729
jeromy@cloudbrix.com<mailto:jeromy@cloudbrix.com>
www.cloudbrix.com<http://www.cloudbrix.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message