cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Lair <sl...@ippathways.com>
Subject RE: Snapshots on KVM corrupting disk images
Date Wed, 23 Jan 2019 03:01:33 GMT
Thanks Wei!  We really appreciate the response and the link.

Shouldn't we be doing something to stop the ability to use snapshots (scheduled and other
snapshot operations) in CloudStack?  

-----Original Message-----
From: Wei ZHOU [mailto:ustcweizhou@gmail.com] 
Sent: Tuesday, January 22, 2019 4:06 PM
To: dev@cloudstack.apache.org
Subject: Re: Snapshots on KVM corrupting disk images

Hi Sean,

The (recurring) volume snapshot on running vms should be disabled in cloudstack.

According to some discussions (for example https://bugzilla.redhat.com/show_bug.cgi?id=920020),
the image might be corrupted due to the concurrent read/write operations in volume snapshot
(by qemu-img snapshot).

```

qcow2 images must not be used in read-write mode from two processes at the same time. You
can either have them opened either by one read-write process or by many read-only processes.
Having one (paused) read-write process (the running
VM) and additional read-only processes (copying out a snapshot with qemu-img) may happen to
work in practice, but you're on your own and we won't give support for such attempts.

```
The safe way to take a volume snapshot of running vm is
(1) take a vm snapshot (vm will be paused)
(2) then create a volume snapshot from the vm snapshot

-Wei



Sean Lair <slair@ippathways.com> 于2019年1月22日周二 下午5:30写道:

> Hi all,
>
> We had some instances where VM disks are becoming corrupted when using 
> KVM snapshots.  We are running CloudStack 4.9.3 with KVM on CentOS 7.
>
> The first time was when someone mass-enabled scheduled snapshots on a 
> lot of large number VMs and secondary storage filled up.  We had to 
> restore all those VM disks...  But believed it was just our fault with 
> letting secondary storage fill up.
>
> Today we had an instance where a snapshot failed and now the disk 
> image is corrupted and the VM can't boot.  here is the output of some commands:
>
> -----------------------
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': 
> Could not read snapshots: File too large
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': 
> Could not read snapshots: File too large
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> -rw-r--r--. 1 root root 73G Jan 22 11:04
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> -----------------------
>
> We tried restoring to before the snapshot failure, but still have 
> strange
> errors:
>
> ----------------------
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> -rw-r--r--. 1 root root 73G Jan 22 11:04
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> file format: qcow2
> virtual size: 50G (53687091200 bytes)
> disk size: 73G
> cluster_size: 65536
> Snapshot list:
> ID        TAG                 VM SIZE                DATE       VM CLOCK
> 1         a8fdf99f-8219-4032-a9c8-87a6e09e7f95   3.7G 2018-12-23 11:01:43
> 3099:35:55.242
> 2         b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd   3.8G 2019-01-06 11:03:16
> 3431:52:23.942
> Format specific information:
>     compat: 1.1
>     lazy refcounts: false
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> tcmalloc: large alloc 1539750010880 bytes == (nil) @  0x7fb9cbbf7bf3
> 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc 
> 0x55d16ddf2541 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 
> 0x55d16de373e6 0x7fb9c63a3c05 0x55d16ddd9f7d No errors were found on 
> the image.
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img 
> snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> Snapshot list:
> ID        TAG                 VM SIZE                DATE       VM CLOCK
> 1         a8fdf99f-8219-4032-a9c8-87a6e09e7f95   3.7G 2018-12-23 11:01:43
> 3099:35:55.242
> 2         b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd   3.8G 2019-01-06 11:03:16
> 3431:52:23.942
> --------------------------
>
> Everyone is now extremely hesitant to use snapshots in KVM....  We 
> tried deleting the snapshots in the restored disk image, but it errors out...
>
>
> Does anyone else have issues with KVM snapshots?  We are considering 
> just disabling this functionality now...
>
> Thanks
> Sean
>
>
>
>
>
>
>
Mime
View raw message