cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Huang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-3367) When one primary storage fails, all XenServer hosts get rebooted, killing all VMs, even those not on this primary storage.
Date Sat, 27 Jul 2013 01:49:49 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721491#comment-13721491
] 

Alex Huang commented on CLOUDSTACK-3367:
----------------------------------------

Our experience in testing this with the 5.6 version of XenServer is that if we attempt to
stop the VMs with XenServer while the storage is out, that XenServer may not shut them down
cleanly due to storage problems, leading to further problems down the road.  It's the reason
why we chose to reboot instead of stop VMs.  

You also have to consider how often this happens.  If a storage server needs to be taken out,
the storage server should be put in maintenance mode which shutdown the vms.  In that case,
then it won't cause host to reboot.  Therefore, this can only happen with an unscheduled outage
of the storage server.

We can add a few things to make this happen less often.

- Don't put a heartbeat on the storage until a VM using that storage is on a host.
- Remove the heartbeat on the storage when all VMs using that storage is done.
- Try to stop the VMs within a short interval and if by that interval we can't stop the VMs,
then reboot.
                
> When one primary storage fails, all XenServer hosts get rebooted, killing all VMs, even
those not on this primary storage.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-3367
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3367
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server, XenServer
>    Affects Versions: 4.1.0, 4.2.0
>         Environment: CentOS 6.3, XenServer 6.0.2 + all hotfixes, CloudStack 4.1.0
>            Reporter: France
>            Priority: Critical
>             Fix For: Future
>
>
> As the title says: if only one of the primary storages fails, all XenServer hosts get
rebooted one by one. Because i have many primary storages, which are/were running fine with
other VMs, rebooting XenServer Hipervisor is an overkill. Please disable this or implement
just stopping/killing the VMs running on that storage and try to re-attach that storage only.
> Problem was reported on the mailing list, as well as a workaround for XenServer. So i'm
not the only one hit by this "bug/feature". Workaround for now is as follows:
> 1. Modify /opt/xensource/bin/xenheartbeat.sh on all your Hosts, commenting out the two
entries which have "reboot -f"
> 2. Identify the PID of the script  - pidof -x xenheartbeat.sh
> 3. Restart the Script  - kill <pid>
> 4. Force reconnect Host from the UI,  the script will then re-launch on reconnect

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message