From dev-return-112574-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Fri Mar 1 19:00:25 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 42758180647 for ; Fri, 1 Mar 2019 20:00:24 +0100 (CET) Received: (qmail 76277 invoked by uid 500); 1 Mar 2019 19:00:23 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 76263 invoked by uid 99); 1 Mar 2019 19:00:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Mar 2019 19:00:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id F28E8180D68 for ; Fri, 1 Mar 2019 19:00:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.999 X-Spam-Level: * X-Spam-Status: No, score=1.999 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=bw-sw-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id shnDu5eYkeEZ for ; Fri, 1 Mar 2019 19:00:17 +0000 (UTC) Received: from mail-ua1-f51.google.com (mail-ua1-f51.google.com [209.85.222.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 40C825F4D4 for ; Fri, 1 Mar 2019 19:00:17 +0000 (UTC) Received: by mail-ua1-f51.google.com with SMTP id g1so12652735uae.10 for ; Fri, 01 Mar 2019 11:00:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bw-sw-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gHLukWfdIOxntMP4DgE5ltp+Sz70kB+SQw/7oULp4w8=; b=gIxYp15sUBaD+vRTCGqDhYHyyBPTmVnJZNwFVabJfKMlSclXVDJNFvxAan92XSk8L8 Rjpd99wMJVzyUWkxf1IfK6Z7JbYH41qTCrnM4IA//VZEDDv+WZgChStC84b6169ji0fR zFlBFKVDcbw6Y5YSsAeEQK+XyyAQa3PsV1hW0s+Lrr0jv5CDvPffncHK1TXvgsmMU7m1 Fvz2H6RKWlXGQ3Tk3Ueqry2KXUJKHs2Asqs/IXyguIOhfC1VfzY81iT9yCTpnkwOlsa0 mdz1fJHPxb69o9NaGFCHdx1iK3SqJTmyvOCKvWpT6EBCDEH32Pie3Ey4f4CN+ajnGEGA b1+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gHLukWfdIOxntMP4DgE5ltp+Sz70kB+SQw/7oULp4w8=; b=cC7rgcynAb9/ynmceClmteTETD9e/MgTUJcdZ5xLGeKBRK7ZukYtVuL+ES7HynrYTg V+rU4/yOuVj5/PEhMppBmuQDPolWDHj7ViCXZMlrVibIVeZ1ezVSgZTwcRY9bS+Xd19o k5ndV7mRQwQoeRuLY1DDJTcBh35C6XBllBjWMXE/I34XmJ+PJNy4CHtEHitOFV8c07Z3 ZXY1XkHbjM0urhEIVu4hhkZ0xwdTeT5bgRNtm9ahRZB6I6wSUR/1RfnYKjz3WMmjHw2W PCJNl8tK/S2Gvf3kRy8wrUEV7ejEWBwi9b2GijsPTyg/VHe9TjqmWw77kKfI9dn9oNug H7ig== X-Gm-Message-State: APjAAAXIpfSswgucCO6hFa6d6WUEWtx1fIT420S5KhzMWjf+2UgCtg7O jlCCX3NP42l1txznpoLKkTXo2so3QqaGNiWBgoXkYA== X-Google-Smtp-Source: APXvYqxTvr3HW2xq9uUuhRygLjCw+Hyo0lDIfnXHwLQLwOry49ZBaMIixkN5y4BbqipQNyqSKLlc/wwkZcUH1bj9uxc= X-Received: by 2002:a67:e897:: with SMTP id x23mr3506156vsn.4.1551466816383; Fri, 01 Mar 2019 11:00:16 -0800 (PST) MIME-Version: 1.0 References: <1178e3907a2c4f8daacd189ec15840a3@IPPEXCH13MB1.ipp.corp> In-Reply-To: <1178e3907a2c4f8daacd189ec15840a3@IPPEXCH13MB1.ipp.corp> From: Ivan Kudryavtsev Date: Fri, 1 Mar 2019 14:00:05 -0500 Message-ID: Subject: Re: Snapshots on KVM corrupting disk images To: users Cc: "dev@cloudstack.apache.org" , cloudstack-fan Content-Type: multipart/alternative; boundary="00000000000078c48c05830d0418" --00000000000078c48c05830d0418 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Sean, I saw the PR https://github.com/apache/cloudstack/pull/3194 which seems covers one of the bugs. Haven't had enough time to dive into the code to do a review for snapshot-related workflows, but looks like this PR does the right thing. Hope it will be added to 4.11.3. =D1=87=D1=82, 28 =D1=84=D0=B5=D0=B2=D1=80. 2019 =D0=B3. =D0=B2 17:02, Sean = Lair : > Hi Ivan, I wanted to respond here and see if you published a PR yet on > this. > > This is a very scary issue for us as customer can snapshot their volumes > and end up causing corruption - and they blame us. It's already happened= - > luckily we had Storage Array level snapshots in place as a safety net... > > Thanks!! > Sean > > -----Original Message----- > From: Ivan Kudryavtsev [mailto:kudryavtsev_ia@bw-sw.com] > Sent: Sunday, January 27, 2019 7:29 PM > To: users ; cloudstack-fan < > cloudstack-fan@protonmail.com> > Cc: dev > Subject: Re: Snapshots on KVM corrupting disk images > > Well, guys. I dived into CS agent scripts, which make volume snapshots an= d > found there are no code for suspend/resume and also no code for qemu-agen= t > call fsfreeze/fsthaw. I don't see any blockers adding that code yet and t= ry > to add it in nearest days. If tests go well, I'll publish the PR, which I > suppose could be integrated into 4.11.3. > > =D0=BF=D0=BD, 28 =D1=8F=D0=BD=D0=B2. 2019 =D0=B3., 2:45 cloudstack-fan > cloudstack-fan@protonmail.com.invalid: > > > Hello Sean, > > > > It seems that you've encountered the same issue that I've been facing > > during the last 5-6 years of using ACS with KVM hosts (see this > > thread, if you're interested in additional details: > > https://mail-archives.apache.org/mod_mbox/cloudstack-users/201807.mbox > > /browser > > ). > > > > I'd like to state that creating snapshots of a running virtual machine > > is a bit risky. I've implemented some workarounds in my environment, > > but I'm still not sure that they are 100% effective. > > > > I have a couple of questions, if you don't mind. What kind of storage > > do you use, if it's not a secret? Does you storage use XFS as a > filesystem? > > Did you see something like this in your log-files? > > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size > > 65552 in kmem_realloc (mode:0x250) > > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size > > 65552 in kmem_realloc (mode:0x250) > > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size > > 65552 in kmem_realloc (mode:0x250) > > Did you see any unusual messages in your log-file when the disaster > > happened? > > > > I hope, things will be well. Wish you good luck and all the best! > > > > > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Origina= l Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 > > On Tuesday, 22 January 2019 18:30, Sean Lair > wrote: > > > > > Hi all, > > > > > > We had some instances where VM disks are becoming corrupted when > > > using > > KVM snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7. > > > > > > The first time was when someone mass-enabled scheduled snapshots on > > > a > > lot of large number VMs and secondary storage filled up. We had to > > restore all those VM disks... But believed it was just our fault with > > letting secondary storage fill up. > > > > > > Today we had an instance where a snapshot failed and now the disk > > > image > > is corrupted and the VM can't boot. here is the output of some commands= : > > > > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ------------------------------------------------ > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > > check > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': > > > Could > > not read snapshots: File too large > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > > info > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': > > > Could > > not read snapshots: File too large > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > -rw-r--r--. 1 root root 73G Jan 22 11:04 > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ----------------------------------------------------------- > > > > > > We tried restoring to before the snapshot failure, but still have > > strange errors: > > > > > > > > ---------------------------------------------------------------------- > > -------------- > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > -rw-r--r--. 1 root root 73G Jan 22 11:04 > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > > info > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > file format: qcow2 > > > virtual size: 50G (53687091200 bytes) disk size: 73G > > > cluster_size: 65536 > > > Snapshot list: > > > ID TAG VM SIZE DATE VM CLOCK > > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > > 3099:35:55.242 > > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > > 3431:52:23.942 > > > Format specific information: > > > compat: 1.1 > > > lazy refcounts: false > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > > check > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > tcmalloc: large alloc 1539750010880 bytes =3D=3D (nil) @ 0x7fb9cbbf7b= f3 > > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc > > 0x55d16ddf2541 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db > > 0x55d16de373e6 0x7fb9c63a3c05 0x55d16ddd9f7d > > > No errors were found on the image. > > > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > Snapshot list: > > > ID TAG VM SIZE DATE VM CLOCK > > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > > 3099:35:55.242 > > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > > 3431:52:23.942 > > > > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > --------------------------------------------------------------- > > > > > > Everyone is now extremely hesitant to use snapshots in KVM.... We > > > tried > > deleting the snapshots in the restored disk image, but it errors out... > > > > > > Does anyone else have issues with KVM snapshots? We are considering > > > just > > disabling this functionality now... > > > > > > Thanks > > > Sean > > > > > > > --=20 With best regards, Ivan Kudryavtsev Bitworks LLC Cell RU: +7-923-414-1515 Cell USA: +1-201-257-1512 WWW: http://bitworks.software/ --00000000000078c48c05830d0418--