From dev-return-112398-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Wed Jan 23 03:16:53 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id E0816180634 for ; Wed, 23 Jan 2019 03:16:52 +0100 (CET) Received: (qmail 63880 invoked by uid 500); 23 Jan 2019 02:16:46 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 63801 invoked by uid 99); 23 Jan 2019 02:16:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2019 02:16:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8010C180570 for ; Wed, 23 Jan 2019 02:16:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.998 X-Spam-Level: * X-Spam-Status: No, score=1.998 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=bw-sw-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Z-3MEf01Zx4T for ; Wed, 23 Jan 2019 02:16:41 +0000 (UTC) Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D83175FAC6 for ; Wed, 23 Jan 2019 02:16:40 +0000 (UTC) Received: by mail-vs1-f54.google.com with SMTP id b74so374853vsd.9 for ; Tue, 22 Jan 2019 18:16:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bw-sw-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=PBgaSEPpjnxoA8t1sp8TH2lEyPSGsNeVGlg/zIRDDcQ=; b=VvArJ5Ev4XabsDrWPF59FRzYhEzDRXNPbXIEMHl1ARSmCCxDOfLPFdCdy14mufm1xA wK/1l52xOon40FmCZjygQEjOxdbITSenXQ4/mn2Ymn3KTy8IdAxACRO/fvZIOxWgfz0f f0128A9p1zJn/rhlVwvlNYzKZHNPBSQ/bGjFXW3PWjjPdXhOGTGTrGYYfuVf0kfHRzqT 9GHKkc5CSPkvySJ8sFS3PuTdJvC8A3edaFfGZ2yTtZkwDYvZvUkeJGetJjvFM0HzTDwz V5btuCjbzMUc5+P38ZccVjqv66xva/tqe9N5IbMIDJ7u6XFXZcR2TFTK8rfM8rANtSAE 1bvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=PBgaSEPpjnxoA8t1sp8TH2lEyPSGsNeVGlg/zIRDDcQ=; b=tIFA4lyKogd9b5qwYKr7xXn7Svpy8L+CvefpNSbPsgN+WRvkwd2hf7XBOXrnwyUnPI spcgJHcVtxC0q0Z93xAJsnQE/yaBCT2t3zphYMdrEwVB9i8gXADkWFbuDeehPg7g68Z3 P6gX6ELt/e2CehYlsSG/X+mtSDWHDr9MkPckN4DD9uvpYXVS90iQatQLOxaGOfkZsW8p NJ/ZGO0jj1OaLUclcdMI4OxfVd5arlY4KrF0VgUVKl35gHp2reQz57vDjRN987whqheT u51ZE82fDr/Igly4IGmrkatvqB42BuCelVIjhRAAR9qUvJdKDWLZVn8uy8mSigR90doF Gtyg== X-Gm-Message-State: AJcUukd5TgcYZbb4IkdhORdClB+VXybT+aYpSB1qcBB3CjiWJ0zubsn2 //+mUlgQ6T7ylLh9j6PAQGu4SJFBvjWyREzsCIRbqQqtOvw= X-Google-Smtp-Source: ALg8bN51OzGJqhvqkXRUVQ+nkNPXM8xYtlVb7UHzG1GKpCghafB+kdvkDLYRqwnh/JJUR+LZ/G7bAF0zvmEuH3KJXiE= X-Received: by 2002:a67:3885:: with SMTP id n5mr121940vsi.96.1548209799354; Tue, 22 Jan 2019 18:16:39 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ivan Kudryavtsev Date: Wed, 23 Jan 2019 09:16:28 +0700 Message-ID: Subject: Re: Snapshots on KVM corrupting disk images To: dev Content-Type: multipart/alternative; boundary="00000000000020ee41058016af0c" --00000000000020ee41058016af0c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I've met the situations when CLOUDSTACK+KVM+QCOW2+SNAPSHOTS led to corrupted images, mostly in 4.3 and NFS, but I've thought that CS stops VM just before it does the snapshot. At least the VM behavior when the VM snapshot is created looks like it happens (freezing). That's why it looks strange. But, in general, I agree, that the above bundle leads to data corruption, especially when the storage is under IO pressure. We recommend our customers avoiding running snapshots if possible for such a bundle. =D1=81=D1=80, 23 =D1=8F=D0=BD=D0=B2. 2019 =D0=B3. =D0=B2 05:06, Wei ZHOU : > Hi Sean, > > The (recurring) volume snapshot on running vms should be disabled in > cloudstack. > > According to some discussions (for example > https://bugzilla.redhat.com/show_bug.cgi?id=3D920020), the image might be > corrupted due to the concurrent read/write operations in volume snapshot > (by qemu-img snapshot). > > ``` > > qcow2 images must not be used in read-write mode from two processes at th= e > same > time. You can either have them opened either by one read-write process or > by > many read-only processes. Having one (paused) read-write process (the > running > VM) and additional read-only processes (copying out a snapshot with > qemu-img) > may happen to work in practice, but you're on your own and we won't give > support for such attempts. > > ``` > The safe way to take a volume snapshot of running vm is > (1) take a vm snapshot (vm will be paused) > (2) then create a volume snapshot from the vm snapshot > > -Wei > > > > Sean Lair =E4=BA=8E2019=E5=B9=B41=E6=9C=8822=E6=97= =A5=E5=91=A8=E4=BA=8C =E4=B8=8B=E5=8D=885:30=E5=86=99=E9=81=93=EF=BC=9A > > > Hi all, > > > > We had some instances where VM disks are becoming corrupted when using > KVM > > snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7. > > > > The first time was when someone mass-enabled scheduled snapshots on a l= ot > > of large number VMs and secondary storage filled up. We had to restore > all > > those VM disks... But believed it was just our fault with letting > > secondary storage fill up. > > > > Today we had an instance where a snapshot failed and now the disk image > is > > corrupted and the VM can't boot. here is the output of some commands: > > > > ----------------------- > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Coul= d > > not read snapshots: File too large > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Coul= d > > not read snapshots: File too large > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > -rw-r--r--. 1 root root 73G Jan 22 11:04 > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > ----------------------- > > > > We tried restoring to before the snapshot failure, but still have stran= ge > > errors: > > > > ---------------------- > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > -rw-r--r--. 1 root root 73G Jan 22 11:04 > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > file format: qcow2 > > virtual size: 50G (53687091200 bytes) > > disk size: 73G > > cluster_size: 65536 > > Snapshot list: > > ID TAG VM SIZE DATE VM CLOC= K > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:= 43 > > 3099:35:55.242 > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:= 16 > > 3431:52:23.942 > > Format specific information: > > compat: 1.1 > > lazy refcounts: false > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check > > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > tcmalloc: large alloc 1539750010880 bytes =3D=3D (nil) @ 0x7fb9cbbf7bf= 3 > > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc > 0x55d16ddf2541 > > 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 0x55d16de373e6 > 0x7fb9c63a3c05 > > 0x55d16ddd9f7d > > No errors were found on the image. > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > snapshot > > -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > Snapshot list: > > ID TAG VM SIZE DATE VM CLOC= K > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:= 43 > > 3099:35:55.242 > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:= 16 > > 3431:52:23.942 > > -------------------------- > > > > Everyone is now extremely hesitant to use snapshots in KVM.... We trie= d > > deleting the snapshots in the restored disk image, but it errors out... > > > > > > Does anyone else have issues with KVM snapshots? We are considering ju= st > > disabling this functionality now... > > > > Thanks > > Sean > > > > > > > > > > > > > > > --=20 With best regards, Ivan Kudryavtsev Bitworks LLC Cell RU: +7-923-414-1515 Cell USA: +1-201-257-1512 WWW: http://bitworks.software/ --00000000000020ee41058016af0c--