cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sriharsha work <>
Subject Re: Help! After network outage, can't start System VMs; focused debug info attached
Date Tue, 17 Sep 2013 02:36:21 GMT
Replying on behalf of Matt. We are able to write data to the Nfs drives.
That's not an issue.


Sent from my iPhone

> On Sep 16, 2013, at 19:30, Ahmad Emneina <> wrote:
> Try to mount your primary storage to a compute host and try to write to it.
> Your NFS server might not have come back up properly (settings-wise or all
> the relevant services).
>> On Sep 16, 2013 6:08 PM, "Matt Foley" <> wrote:
>> Thank you Chiradeep.  Log snippet now available as
>> --Matt
>> On Mon, Sep 16, 2013 at 5:19 PM, Chiradeep Vittal <
>>> wrote:
>>> Attachments are stripped. Can you paste (say at
>>> From: Matt Foley <>
>>> Date: Monday, September 16, 2013 4:58 PM
>>> We had a planned network outage this weekend, which inadvertently
>> resulted
>>> in making the NFS Shared Primary Storage (used by System VMs) unavailable
>>> for a day and a half.  (Guest VMs use local storage only, but System VMs
>>> use shared storage only.)  Cloudstack was not brought down prior to the
>>> outage.
>>> After network came back, we gracefully brought down all services
>> including
>>> cloudstack-management, mysql, and NFS, then actually rebooted all servers
>>> in the cluster and the NFS server (to make sure no stale file handles),
>>> then brought up services in the appropriate order.  Also checked mysql
>> for
>>> table corruption, and found none.  Confirmed that the NFS volumes are
>>> mountable from all hosts, and in fact Shared Primary Storage is being
>>> mounted by cloudstack on hosts as usual, under /mnt/<uuid>.
>>> Nevertheless, when try to bring up the cluster, we fail to start the
>>> system VMs, with errors "InsufficientServerCapacityException: Unable to
>>> create a deployment for VM".  The cause is not really insufficient
>>> capacity, as actual usage of resources is tiny; these error messages are
>>> false explanations of the failure to create primary storage volume for
>> the
>>> System VMs.
>>> Digging into management-server.log, the core issue seems to be the ~160
>>> line snippet from the log attached to this message as
>>> cloudstack_debug_2013.09.16.log. The only Shared Primary Storage pool is
>>> pool 201, named "cs-primary".  It is mounted on all hosts as
>>> /mnt/9c6fd9a3-43e5-389a-9594-faecf178b4b9, which is its uuid.  The log
>>> shows the management server correctly identifying a particular host as
>>> being able to access pool 201, then trying to allocate a primary storage
>>> volume using the template with uuid f23a16e7-b628-429e-83e1-698935588465.
>>> It fails, but I cannot tell why.  I suspect its claim that "Template 3
>> has
>>> already been downloaded to pool 201" is false, but I don't know how to
>>> check this (or fix if wrong).
>>> Any guidance for further debugging or fixing this would be GREATLY
>>> appreciated.
>>> Thanks,
>>> --Matt
>> --
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.

View raw message