cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Vazquez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CLOUDSTACK-10326) Prevent hosts fall into Maintenance when there are running VMs on it
Date Fri, 16 Mar 2018 00:01:00 GMT

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicolas Vazquez updated CLOUDSTACK-10326:
-----------------------------------------
    Description: 
This issue was discovered, fixed and tested on KVM, but applies for every hypervisor.
h2. Background

When enabling maintenance mode in a host, host state is put into 'PrepareForMaintenance'
and running VMs are migrated into another host. After every VM is migrated, host goes to 'Maintenance'
state.

Checks are performed on ResourceManagerImpl.checkAndMaintan() method:
 * List VMs with host_id = HOST_ID
 * List VMs with last_host_id = HOST_ID and state=Migrating

When both queries are empty, then the host can be put into Maintenance.

When a VM is being migrated to DEST_HOST, its host_id column is set to DEST_HOST, last_host_id
= ORIGIN_HOST and state = Migrating. If then migration fails, host_id = last_host_id = ORIGIN_HOST 
h2. Issue

This sequence:
 * Enable maintenance mode on ORIGIN_HOST
 * VMs start being migrated to a host, say DEST_HOST
 * checkAndMaintain() starts:
 ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are being migrated)
 ** Before the second check, one or more migrations fail
 ** Second check passes, however there are VMs running on the host as migrations have failed.
 * Host goes into Maintenance state.

Screenshots attached, query executed on each case:

select id, name, instance_name, state, host_id, last_host_id from vm_instance;

  was:
This issue was discovered, fixed and tested on KVM, but applies for every hypervisor.
h2. Background

When enabling maintenance mode in a host, host state is put into 'PrepareForMaintenance'
and running VMs are migrated into another host. After every VM is migrated, host goes to 'Maintenance'
state.

Checks are performed on ResourceManagerImpl.checkAndMaintan() method:
 * List VMs with host_id = HOST_ID
 * List VMs with last_host_id = HOST_ID and state=Migrating

When both queries are empty, then the host can be put into Maintenance.

When a VM is being migrated to DEST_HOST, its host_id column is set to DEST_HOST, last_host_id
= ORIGIN_HOST and state = Migrating. If then migration fails, host_id = last_host_id = ORIGIN_HOST 
h2. Issue

This sequence:
 * Enable maintenance mode on ORIGIN_HOST
 * VMs start being migrated to a host, say DEST_HOST
 * checkAndMaintain() starts:
 ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are being migrated)
 ** Before the second check, one or more migrations fail
 ** Second check passes, however there are VMs running on the host as migrations have failed.
 * Host goes into Maintenance state.

 


> Prevent hosts fall into Maintenance when there are running VMs on it
> --------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-10326
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10326
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>    Affects Versions: 4.11.0.0
>            Reporter: Nicolas Vazquez
>            Assignee: Nicolas Vazquez
>            Priority: Major
>             Fix For: 4.11.1.0
>
>         Attachments: CLOUDSTACK-10326-Debug.png, CLOUDSTACK-10326-InitialState.png, CLOUDSTACK-10326-Migrating.png,
CLOUDSTACK-10326-MigrationFailed.png
>
>
> This issue was discovered, fixed and tested on KVM, but applies for every hypervisor.
> h2. Background
> When enabling maintenance mode in a host, host state is put into 'PrepareForMaintenance'
and running VMs are migrated into another host. After every VM is migrated, host goes to 'Maintenance'
state.
> Checks are performed on ResourceManagerImpl.checkAndMaintan() method:
>  * List VMs with host_id = HOST_ID
>  * List VMs with last_host_id = HOST_ID and state=Migrating
> When both queries are empty, then the host can be put into Maintenance.
> When a VM is being migrated to DEST_HOST, its host_id column is set to DEST_HOST, last_host_id
= ORIGIN_HOST and state = Migrating. If then migration fails, host_id = last_host_id = ORIGIN_HOST 
> h2. Issue
> This sequence:
>  * Enable maintenance mode on ORIGIN_HOST
>  * VMs start being migrated to a host, say DEST_HOST
>  * checkAndMaintain() starts:
>  ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are being migrated)
>  ** Before the second check, one or more migrations fail
>  ** Second check passes, however there are VMs running on the host as migrations have
failed.
>  * Host goes into Maintenance state.
> Screenshots attached, query executed on each case:
> select id, name, instance_name, state, host_id, last_host_id from vm_instance;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message