cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Francis <trevor.fran...@tgrahamcapital.com>
Subject Re: NFS Crash crashes all hosts
Date Sun, 25 Nov 2012 19:45:32 GMT
Iozone tests. I was testing with multiple threads.

iozone -l 32 -O -i 0 -i 1 -i 2 -e -+n -r 4K -s 4G > test.txt

This crashed the NFS store.

I backed it back to 4 processes and it ran fine. I also did a standard iozone -a for automatic
tests. 

IP Traffic Throughput peaked at around 800Mb/sec from the VM and didnt crash the NFS store.
I am trying to figure out how the previous test caused issues with the NFS store. In actuality,
the store never crashed....but lost IP connectivity causing the hosts to think it was dead.
This is strange because I am running link bonding across multiple trunked switch. So, I should
be able to pull any network cable out of my setup and not cause an issue....

I am running an NFS store to the hosts. Thoughts?



Trevor Francis
Partner
46 Labs | PeerEdge Cloud Switch (PeCS)
http://www.46labs.com | http://www.peeredge.net
720-214-3643- Voice
trevor@46labs.com
 
Solutions Provider for the Telecom Industry

 

On Nov 25, 2012, at 12:04 PM, Ahmad Emneina <Ahmad.Emneina@citrix.com> wrote:

> What tests were you running and what kind of throughput were you seeing? Vm speed throttling
is probably happening for vm to vm or vm to Internet traffic, not a QoS limit on its storage
throughput. That would probably have to be enforced on the hypervisor manually, I don't think
cloudstack has that feature yet.
> 
> Ahmad
> 
> On Nov 25, 2012, at 9:50 AM, "Trevor Francis" <trevor.francis@tgrahamcapital.com<mailto:trevor.francis@tgrahamcapital.com>>
wrote:
> 
> why would a high-io vm cause this.
> 
> The hosts run bonded GigE for Storage/Management and the storage server runs Quad-bonded
GigE. There shouldnt be a scenario where a VM can take out the storage server or even a host
for that matter......Also, VM speed is limited to 1000Mb/sec.
> 
> Thoughts?
> 
> 
> Trevor Francis
> Partner
> 46 Labs | PeerEdge Cloud Switch (PeCS)
> http://www.46labs.com | http://www.peeredge.net
> 720-214-3643- Voice
> trevor@46labs.com<mailto:trevor@46labs.com>
> 
> Solutions Provider for the Telecom Industry
> 
> <image001.jpg> <image002.jpg>
> 
> On Nov 25, 2012, at 11:42 AM, Ahmad Emneina <Ahmad.Emneina@citrix.com<mailto:Ahmad.Emneina@citrix.com>>
wrote:
> 
> This is expected behavior to prevent disk corruption, during a host communication outage.
> 
> Excerpt from [1]:
> 'The worst-case scenario for HA is the situation where a host is thought to be off-line
but is actually still writing to the shared storage, because this can result in corruption
of persistent data. To prevent this situation without requiring active power strip controls,
XenServer employs hypervisor-level fencing. This is a Xen modification which hard-powers off
the host at a very low-level if it does not hear regularly from a watchdog process running
in the control domain. Because it is implemented at a very low-level, this also protects the
storage in the case where the control domain becomes unresponsive for some reason.'
> 
> [1] http://support.citrix.com/servlet/KbServlet/download/21018-102-664364/High%20Availability%20for%20Citrix%20XenServer.pdf
> 
> Ahmad
> 
> On Nov 25, 2012, at 7:51 AM, "Trevor Francis" <trevor.francis@tgrahamcapital.com<mailto:trevor.francis@tgrahamcapital.com><mailto:trevor.francis@tgrahamcapital.com>>
wrote:
> 
> We performed an IOZONE test through one of our VMs to benchmark our NFS store. It saturated
the link, causing the NFS server to stop responding. (according to the logs on the hosts)
> 
> This caused every one of our hosts (Running XS 6.02) to reboot itself.
> 
> Nov 25 09:13:24 compute0 heartbeat: Problem with /var/run/sr-mount/6b407ac5-aca7-1ade-de4e-765a728d6f52/hb-365a44b3-8083-4b3e-a748-498f3f9b0017
> Nov 25 09:13:24 compute0 kernel: nfs: server 172.16.0.5 not responding, timed out
> Nov 25 09:15:56 compute0 syslogd 1.4.1: restart.
> 
> 
> We are running standard NFS on a linux server. The server reported no errors.
> 
> We are running CS4.
> 
> Why would this happen?
> 
> 
> 
> 
> 
> 
> 
> Trevor Francis
> Partner
> 46 Labs | PeerEdge Cloud Switch (PeCS)
> http://www.46labs.com | http://www.peeredge.net
> 720-214-3643- Voice
> trevor@46labs.com<mailto:trevor@46labs.com><mailto:trevor@46labs.com>
> 
> Solutions Provider for the Telecom Industry
> 
> <image001.jpg> <image002.jpg>
> 
> 


Mime
View raw message