cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrija Panic <andrija.pa...@gmail.com>
Subject Re: Controlled shutdown post-partial-switch failure...
Date Mon, 17 Jun 2019 13:19:24 GMT
Been there, done that, a year ago (whole DC down == everything shutting
down (storage things, S3 things, CloudStack things (all VMs and
everything), etc, etc, etc) - very successfully but not funny...)

On Mon, 17 Jun 2019 at 14:33, David Merrill <david.merrill@otelco.com>
wrote:

> Hi All,
>
> I’m fielding an worrisome emergency situation where a set of (2) stacked
> switches (serving the MGMT & SAN networks) has partially failed (one of the
> stack members was ejected).
>
> Here’s what I’ve got:
>
>
>   *   The Xen hosts 2 MGMT NICs are bonded (active-passive) and connected
> to both switches
>   *   The Xen hosts 2 SAN NICs are NOT bonded and connected to both
> switches
>   *   PUB/GUEST NICs are connected to a different set of stacked switches
> (and  are fine)
>
> Amazingly CloudStack has survived (guest VMs are still running, there’s
> been no disk issues).
>
> However I’ve got one compute-cluster (of 6 Xen hosts) in an alert state
> (as the pool master is affected) in the CloudStack UI (and cannot manage
> guests there) and I cannot get to their MGMT interfaces (going to hop on
> the hosts today and get more intel).
>
> A replacement switch is arriving in 24 hours & I’m preparing the
> switch-swap process.
>
> I’d REALLY like to shut everything down (guest VM’s) before mucking about
> with the switch-stack serving the SAN network, but I think I have to get to
> the host MGMT NICs sorted first (my suspicion is that the bonded MGMT NICS
> haven’t failed over – due t the nature of the switch failure maybe? – I’m
> considering pulling the MGMT NIC connections on the failed switch to see if
> I can get a path back).
>
> Anyway, not really much of an ask here, talking myself through it as I
> ride the tiger.
>
> Thanks.
> David
>
> David Merrill
> Senior Systems Engineer,
> Managed and Private/Hybrid Cloud Services
> OTELCO
> 92 Oak Street, Portland ME 04101
> office 207.772.5678<callto:207.772.5678>
> www.otelco.com<http://www.otelco.com>/business/managed-services
>
> Confidentiality Message
> The information contained in this e-mail transmission may be confidential
> and legally privileged. If you are not the intended recipient, you are
> notified that any dissemination, distribution, copying or other use of this
> information, including attachments, is prohibited. If you received this
> message in error, please call me at 207.772.5678<callto:207.772.5678> so
> this error can be corrected.
>
>

-- 

Andrija Panić

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message