cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Foley <mfo...@hortonworks.com>
Subject Re: Help! After network outage, can't start System VMs; focused debug info attached
Date Tue, 17 Sep 2013 01:07:47 GMT
Thank you Chiradeep.  Log snippet now available as http://apaste.info/qBIB
--Matt

On Mon, Sep 16, 2013 at 5:19 PM, Chiradeep Vittal <
Chiradeep.Vittal@citrix.com> wrote:

> Attachments are stripped. Can you paste (say at http://apaste.info/)
>
> From: Matt Foley <mfoley@hortonworks.com>
> Date: Monday, September 16, 2013 4:58 PM
>
> We had a planned network outage this weekend, which inadvertently resulted
> in making the NFS Shared Primary Storage (used by System VMs) unavailable
> for a day and a half.  (Guest VMs use local storage only, but System VMs
> use shared storage only.)  Cloudstack was not brought down prior to the
> outage.
>
> After network came back, we gracefully brought down all services including
> cloudstack-management, mysql, and NFS, then actually rebooted all servers
> in the cluster and the NFS server (to make sure no stale file handles),
> then brought up services in the appropriate order.  Also checked mysql for
> table corruption, and found none.  Confirmed that the NFS volumes are
> mountable from all hosts, and in fact Shared Primary Storage is being
> mounted by cloudstack on hosts as usual, under /mnt/<uuid>.
>
> Nevertheless, when try to bring up the cluster, we fail to start the
> system VMs, with errors "InsufficientServerCapacityException: Unable to
> create a deployment for VM".  The cause is not really insufficient
> capacity, as actual usage of resources is tiny; these error messages are
> false explanations of the failure to create primary storage volume for the
> System VMs.
>
> Digging into management-server.log, the core issue seems to be the ~160
> line snippet from the log attached to this message as
> cloudstack_debug_2013.09.16.log.  The only Shared Primary Storage pool is
> pool 201, named "cs-primary".  It is mounted on all hosts as
> /mnt/9c6fd9a3-43e5-389a-9594-faecf178b4b9, which is its uuid.  The log
> shows the management server correctly identifying a particular host as
> being able to access pool 201, then trying to allocate a primary storage
> volume using the template with uuid f23a16e7-b628-429e-83e1-698935588465.
>  It fails, but I cannot tell why.  I suspect its claim that "Template 3 has
> already been downloaded to pool 201" is false, but I don't know how to
> check this (or fix if wrong).
>
> Any guidance for further debugging or fixing this would be GREATLY
> appreciated.
> Thanks,
> --Matt
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message