cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Lair <sl...@ippathways.com>
Subject RE: Snapshots on KVM corrupting disk images
Date Thu, 28 Feb 2019 22:01:38 GMT
Hi Ivan, I wanted to respond here and see if you published a PR yet on this.

This is a very scary issue for us as customer can snapshot their volumes and end up causing
corruption - and they blame us.  It's already happened - luckily we had Storage Array level
snapshots in place as a safety net...

Thanks!!
Sean

-----Original Message-----
From: Ivan Kudryavtsev [mailto:kudryavtsev_ia@bw-sw.com] 
Sent: Sunday, January 27, 2019 7:29 PM
To: users <users@cloudstack.apache.org>; cloudstack-fan <cloudstack-fan@protonmail.com>
Cc: dev <dev@cloudstack.apache.org>
Subject: Re: Snapshots on KVM corrupting disk images

Well, guys. I dived into CS agent scripts, which make volume snapshots and found there are
no code for suspend/resume and also no code for qemu-agent call fsfreeze/fsthaw. I don't see
any blockers adding that code yet and try to add it in nearest days. If tests go well, I'll
publish the PR, which I suppose could be integrated into 4.11.3.

пн, 28 янв. 2019 г., 2:45 cloudstack-fan
cloudstack-fan@protonmail.com.invalid:

> Hello Sean,
>
> It seems that you've encountered the same issue that I've been facing 
> during the last 5-6 years of using ACS with KVM hosts (see this 
> thread, if you're interested in additional details:
> https://mail-archives.apache.org/mod_mbox/cloudstack-users/201807.mbox
> /browser
> ).
>
> I'd like to state that creating snapshots of a running virtual machine 
> is a bit risky. I've implemented some workarounds in my environment, 
> but I'm still not sure that they are 100% effective.
>
> I have a couple of questions, if you don't mind. What kind of storage 
> do you use, if it's not a secret? Does you storage use XFS as a filesystem?
> Did you see something like this in your log-files?
> [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size
> 65552 in kmem_realloc (mode:0x250)
> [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size
> 65552 in kmem_realloc (mode:0x250)
> [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size
> 65552 in kmem_realloc (mode:0x250)
> Did you see any unusual messages in your log-file when the disaster 
> happened?
>
> I hope, things will be well. Wish you good luck and all the best!
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, 22 January 2019 18:30, Sean Lair <slair@ippathways.com> wrote:
>
> > Hi all,
> >
> > We had some instances where VM disks are becoming corrupted when 
> > using
> KVM snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7.
> >
> > The first time was when someone mass-enabled scheduled snapshots on 
> > a
> lot of large number VMs and secondary storage filled up. We had to 
> restore all those VM disks... But believed it was just our fault with 
> letting secondary storage fill up.
> >
> > Today we had an instance where a snapshot failed and now the disk 
> > image
> is corrupted and the VM can't boot. here is the output of some commands:
> >
> >
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ------------------------------------------------
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img 
> > check
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': 
> > Could
> not read snapshots: File too large
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img 
> > info
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': 
> > Could
> not read snapshots: File too large
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > -rw-r--r--. 1 root root 73G Jan 22 11:04
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> >
> >
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -----------------------------------------------------------
> >
> > We tried restoring to before the snapshot failure, but still have
> strange errors:
> >
> >
> ----------------------------------------------------------------------
> --------------
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > -rw-r--r--. 1 root root 73G Jan 22 11:04
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img 
> > info
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > file format: qcow2
> > virtual size: 50G (53687091200 bytes) disk size: 73G
> > cluster_size: 65536
> > Snapshot list:
> > ID TAG VM SIZE DATE VM CLOCK
> > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43
> 3099:35:55.242
> > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16
> 3431:52:23.942
> > Format specific information:
> > compat: 1.1
> > lazy refcounts: false
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img 
> > check
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > tcmalloc: large alloc 1539750010880 bytes == (nil) @ 0x7fb9cbbf7bf3
> 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc 
> 0x55d16ddf2541 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 
> 0x55d16de373e6 0x7fb9c63a3c05 0x55d16ddd9f7d
> > No errors were found on the image.
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > Snapshot list:
> > ID TAG VM SIZE DATE VM CLOCK
> > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43
> 3099:35:55.242
> > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16
> 3431:52:23.942
> >
> >
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ---------------------------------------------------------------
> >
> > Everyone is now extremely hesitant to use snapshots in KVM.... We 
> > tried
> deleting the snapshots in the restored disk image, but it errors out...
> >
> > Does anyone else have issues with KVM snapshots? We are considering 
> > just
> disabling this functionality now...
> >
> > Thanks
> > Sean
>
>
>
Mime
View raw message