cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya <ilya.mailing.li...@gmail.com>
Subject Re: [Urgent]: corrupt DB after VM live migration with storage migration
Date Thu, 05 May 2016 04:54:23 GMT
Yiping,

We've dealt with many corruptions in past. It was more around VMware as
it would eat up disks time to time. Or someone would move the VM out of
bound by doing storage or cluster vmotion.

The solution you described should work.

However, for extra paranoid:

step 1, full db backup
step 2, backup the root and data disks as some other file name - just in
case

Then proceed with your proposed solution.

As long as you have proper backups, you should be ok. If VM start
failed, the logs will tell you where cloudstack expects for volume to
be, you can either move the volume there or update cloudstack volumes
table and point it to correct pool_id.

Regards
ilya


On 5/4/16 8:49 PM, Yiping Zhang wrote:
> Before I try the direct DB modifications, I would first:
> 
> * shutdown the VM instances
> * stop cloudstack-management service
> * do a DB backup with mysqldump
> 
> What I worry the most is that the volumes on new cluster’s primary storage device are
marked as “removed”, so if I shutdown the instances, the cloudstack may kick off a storage
cleanup job to remove them from new cluster’s primary storage  before I can get the fixes
in.
> 
> Is there a way to temporarily disable storage cleanups ?
> 
> Yiping
> 
> 
> 
> 
> On 5/4/16, 3:22 PM, "Yiping Zhang" <yzhang@marketo.com> wrote:
> 
>> Hi, all:
>>
>> I am in a situation that I need some help:
>>
>> I did a live migration with storage migration required for a production VM instance
from one cluster to another.  The first migration attempt failed after some time, but the
second attempt succeeded. During all this time the VM instance is accessible (and it is still
up and running).  However, when I use my api script to query volumes, it still reports that
the volume is on the old cluster’s primary storage.  If I shut down this VM,  I am afraid
that it won’t start again as it would try to use non-existing volumes.
>>
>> Checking database, sure enough, the DB still has old info about these volumes:
>>
>>
>> mysql> select id,name from storage_pool where id=1 or id=8;
>>
>> +----+------------------+
>>
>> | id | name             |
>>
>> +----+------------------+
>>
>> |  1 | abprod-primary1  |
>>
>> |  8 | abprod-p1c2-pri1 |
>>
>> +----+------------------+
>>
>> 2 rows in set (0.01 sec)
>>
>>
>> Here the old cluster’s primary storage has id=1, and the new cluster’s primary
storage has id=8.
>>
>>
>> Here are the entries with wrong info in volumes table:
>>
>>
>> mysql> select id,name, uuid, path,pool_id, removed from volumes where name='ROOT-97'
or name='DATA-97';
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> | id  | name    | uuid                                 | path                   
             | pool_id | removed             |
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 5d1ab4ef-2629-4384-a56a-e2dc1055d032
|       1 | NULL                |
>>
>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 6b75496d-5907-46c3-8836-5618f11dac8e
|       1 | NULL                |
>>
>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL                   
             |       8 | 2016-05-03 06:10:40 |
>>
>> | 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL                   
             |       8 | 2016-05-03 06:10:45 |
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> 4 rows in set (0.01 sec)
>>
>> On the xenserver of old cluster, the volumes do not exist:
>>
>>
>> [root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>>
>> [root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>>
>> [root@abmpc-hv01 ~]#
>>
>> But the volumes are on the new cluster’s primary storage:
>>
>>
>> [root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>>
>> uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>>
>>          name-label ( RW): ROOT-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 34359738368
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): true
>>
>>
>> uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>>
>>          name-label ( RW): ROOT-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 34359738368
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): false
>>
>>
>> [root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>>
>> uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>>
>>          name-label ( RW): DATA-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 107374182400
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): false
>>
>>
>> uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>>
>>          name-label ( RW): DATA-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 107374182400
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): true
>>
>>
>> Following is how I plan to fix the corrupted DB entries. Note: using uuid of VDI
volume with read/write access as the path values:
>>
>>
>> 1. for ROOT-97 volume:
>>
>> Update volumes set removed=NOW() where id=124;
>> Update volumes set removed=NULL where id=317;
>> Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>>
>>
>> 2) for DATA-97 volume:
>>
>> Update volumes set pool_id=8 where id=125;
>>
>> Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>>
>>
>> Would this work?
>>
>>
>> Thanks for all the helps anyone can provide.  I have a total of 4 VM instances with
8 volumes in this situation need to be fixed.
>>
>>
>> Yiping

Mime
View raw message