cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmad Emneina <Ahmad.Emne...@citrix.com>
Subject Re: NFS Crash crashes all hosts
Date Sun, 25 Nov 2012 18:04:09 GMT
What tests were you running and what kind of throughput were you seeing? Vm speed throttling
is probably happening for vm to vm or vm to Internet traffic, not a QoS limit on its storage
throughput. That would probably have to be enforced on the hypervisor manually, I don't think
cloudstack has that feature yet.

Ahmad

On Nov 25, 2012, at 9:50 AM, "Trevor Francis" <trevor.francis@tgrahamcapital.com<mailto:trevor.francis@tgrahamcapital.com>>
wrote:

why would a high-io vm cause this.

The hosts run bonded GigE for Storage/Management and the storage server runs Quad-bonded GigE.
There shouldnt be a scenario where a VM can take out the storage server or even a host for
that matter......Also, VM speed is limited to 1000Mb/sec.

Thoughts?


Trevor Francis
Partner
46 Labs | PeerEdge Cloud Switch (PeCS)
http://www.46labs.com | http://www.peeredge.net
720-214-3643- Voice
trevor@46labs.com<mailto:trevor@46labs.com>

Solutions Provider for the Telecom Industry

<image001.jpg> <image002.jpg>

On Nov 25, 2012, at 11:42 AM, Ahmad Emneina <Ahmad.Emneina@citrix.com<mailto:Ahmad.Emneina@citrix.com>>
wrote:

This is expected behavior to prevent disk corruption, during a host communication outage.

Excerpt from [1]:
'The worst-case scenario for HA is the situation where a host is thought to be off-line but
is actually still writing to the shared storage, because this can result in corruption of
persistent data. To prevent this situation without requiring active power strip controls,
XenServer employs hypervisor-level fencing. This is a Xen modification which hard-powers off
the host at a very low-level if it does not hear regularly from a watchdog process running
in the control domain. Because it is implemented at a very low-level, this also protects the
storage in the case where the control domain becomes unresponsive for some reason.'

[1] http://support.citrix.com/servlet/KbServlet/download/21018-102-664364/High%20Availability%20for%20Citrix%20XenServer.pdf

Ahmad

On Nov 25, 2012, at 7:51 AM, "Trevor Francis" <trevor.francis@tgrahamcapital.com<mailto:trevor.francis@tgrahamcapital.com><mailto:trevor.francis@tgrahamcapital.com>>
wrote:

We performed an IOZONE test through one of our VMs to benchmark our NFS store. It saturated
the link, causing the NFS server to stop responding. (according to the logs on the hosts)

This caused every one of our hosts (Running XS 6.02) to reboot itself.

Nov 25 09:13:24 compute0 heartbeat: Problem with /var/run/sr-mount/6b407ac5-aca7-1ade-de4e-765a728d6f52/hb-365a44b3-8083-4b3e-a748-498f3f9b0017
Nov 25 09:13:24 compute0 kernel: nfs: server 172.16.0.5 not responding, timed out
Nov 25 09:15:56 compute0 syslogd 1.4.1: restart.


We are running standard NFS on a linux server. The server reported no errors.

We are running CS4.

Why would this happen?







Trevor Francis
Partner
46 Labs | PeerEdge Cloud Switch (PeCS)
http://www.46labs.com | http://www.peeredge.net
720-214-3643- Voice
trevor@46labs.com<mailto:trevor@46labs.com><mailto:trevor@46labs.com>

Solutions Provider for the Telecom Industry

<image001.jpg> <image002.jpg>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message