cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Melnik <v.mel...@uplink.ua>
Subject Re: Snapshots on KVM corrupting disk images
Date Mon, 04 Mar 2019 12:00:43 GMT
Dear colleagues,

Yes, that was my PR and my pull-request.

Now I would be very grateful for some kind of help from you.

Please, be so kind as to describe your cases here: https://github.com/apache/cloudstack/pull/3194

Thank you so much!

On Fri, Mar 01, 2019 at 02:00:05PM -0500, Ivan Kudryavtsev wrote:
> Hi, Sean,
> I saw the PR https://github.com/apache/cloudstack/pull/3194
> which seems covers one of the bugs. Haven't had enough time to dive into
> the code to do a review for snapshot-related workflows, but looks like this
> PR does the right thing. Hope it will be added to 4.11.3.
> 
> чт, 28 февр. 2019 г. в 17:02, Sean Lair <slair@ippathways.com>:
> 
> > Hi Ivan, I wanted to respond here and see if you published a PR yet on
> > this.
> >
> > This is a very scary issue for us as customer can snapshot their volumes
> > and end up causing corruption - and they blame us.  It's already happened -
> > luckily we had Storage Array level snapshots in place as a safety net...
> >
> > Thanks!!
> > Sean
> >
> > -----Original Message-----
> > From: Ivan Kudryavtsev [mailto:kudryavtsev_ia@bw-sw.com]
> > Sent: Sunday, January 27, 2019 7:29 PM
> > To: users <users@cloudstack.apache.org>; cloudstack-fan <
> > cloudstack-fan@protonmail.com>
> > Cc: dev <dev@cloudstack.apache.org>
> > Subject: Re: Snapshots on KVM corrupting disk images
> >
> > Well, guys. I dived into CS agent scripts, which make volume snapshots and
> > found there are no code for suspend/resume and also no code for qemu-agent
> > call fsfreeze/fsthaw. I don't see any blockers adding that code yet and try
> > to add it in nearest days. If tests go well, I'll publish the PR, which I
> > suppose could be integrated into 4.11.3.
> >
> > пн, 28 янв. 2019 г., 2:45 cloudstack-fan
> > cloudstack-fan@protonmail.com.invalid:
> >
> > > Hello Sean,
> > >
> > > It seems that you've encountered the same issue that I've been facing
> > > during the last 5-6 years of using ACS with KVM hosts (see this
> > > thread, if you're interested in additional details:
> > > https://mail-archives.apache.org/mod_mbox/cloudstack-users/201807.mbox
> > > /browser
> > > ).
> > >
> > > I'd like to state that creating snapshots of a running virtual machine
> > > is a bit risky. I've implemented some workarounds in my environment,
> > > but I'm still not sure that they are 100% effective.
> > >
> > > I have a couple of questions, if you don't mind. What kind of storage
> > > do you use, if it's not a secret? Does you storage use XFS as a
> > filesystem?
> > > Did you see something like this in your log-files?
> > > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size
> > > 65552 in kmem_realloc (mode:0x250)
> > > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size
> > > 65552 in kmem_realloc (mode:0x250)
> > > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size
> > > 65552 in kmem_realloc (mode:0x250)
> > > Did you see any unusual messages in your log-file when the disaster
> > > happened?
> > >
> > > I hope, things will be well. Wish you good luck and all the best!
> > >
> > >
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > On Tuesday, 22 January 2019 18:30, Sean Lair <slair@ippathways.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We had some instances where VM disks are becoming corrupted when
> > > > using
> > > KVM snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7.
> > > >
> > > > The first time was when someone mass-enabled scheduled snapshots on
> > > > a
> > > lot of large number VMs and secondary storage filled up. We had to
> > > restore all those VM disks... But believed it was just our fault with
> > > letting secondary storage fill up.
> > > >
> > > > Today we had an instance where a snapshot failed and now the disk
> > > > image
> > > is corrupted and the VM can't boot. here is the output of some commands:
> > > >
> > > >
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ------------------------------------------------
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> > > > check
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80':
> > > > Could
> > > not read snapshots: File too large
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> > > > info
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80':
> > > > Could
> > > not read snapshots: File too large
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > -rw-r--r--. 1 root root 73G Jan 22 11:04
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > >
> > > >
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > -----------------------------------------------------------
> > > >
> > > > We tried restoring to before the snapshot failure, but still have
> > > strange errors:
> > > >
> > > >
> > > ----------------------------------------------------------------------
> > > --------------
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > -rw-r--r--. 1 root root 73G Jan 22 11:04
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> > > > info
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > file format: qcow2
> > > > virtual size: 50G (53687091200 bytes) disk size: 73G
> > > > cluster_size: 65536
> > > > Snapshot list:
> > > > ID TAG VM SIZE DATE VM CLOCK
> > > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43
> > > 3099:35:55.242
> > > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16
> > > 3431:52:23.942
> > > > Format specific information:
> > > > compat: 1.1
> > > > lazy refcounts: false
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> > > > check
> > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > tcmalloc: large alloc 1539750010880 bytes == (nil) @ 0x7fb9cbbf7bf3
> > > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc
> > > 0x55d16ddf2541 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db
> > > 0x55d16de373e6 0x7fb9c63a3c05 0x55d16ddd9f7d
> > > > No errors were found on the image.
> > > >
> > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> > > snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > > > Snapshot list:
> > > > ID TAG VM SIZE DATE VM CLOCK
> > > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43
> > > 3099:35:55.242
> > > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16
> > > 3431:52:23.942
> > > >
> > > >
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > ---------------------------------------------------------------
> > > >
> > > > Everyone is now extremely hesitant to use snapshots in KVM.... We
> > > > tried
> > > deleting the snapshots in the restored disk image, but it errors out...
> > > >
> > > > Does anyone else have issues with KVM snapshots? We are considering
> > > > just
> > > disabling this functionality now...
> > > >
> > > > Thanks
> > > > Sean
> > >
> > >
> > >
> >
> 
> 
> -- 
> With best regards, Ivan Kudryavtsev
> Bitworks LLC
> Cell RU: +7-923-414-1515
> Cell USA: +1-201-257-1512
> WWW: http://bitworks.software/ <http://bw-sw.com/>

-- 
V.Melnik

Mime
View raw message