From dev-return-112394-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Tue Jan 22 23:06:52 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5B53A180634 for ; Tue, 22 Jan 2019 23:06:51 +0100 (CET) Received: (qmail 51773 invoked by uid 500); 22 Jan 2019 22:06:45 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 51761 invoked by uid 99); 22 Jan 2019 22:06:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2019 22:06:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 5DCC5C047E for ; Tue, 22 Jan 2019 22:06:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id KVmBVOj0WMV8 for ; Tue, 22 Jan 2019 22:06:42 +0000 (UTC) Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AF7BA610BD for ; Tue, 22 Jan 2019 22:06:42 +0000 (UTC) Received: by mail-io1-f54.google.com with SMTP id l14so99616ioj.5 for ; Tue, 22 Jan 2019 14:06:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=fmZdneQWDCgPWVCN4l1MrlVfd9eErYyqZxv9x9yJNMU=; b=aueYJzzBQEp0+SXf5HVFxHAWVAH5r00qHAuSHXpRt2cez8ZIAFvLkM6NK2jgNRUcdZ pSnRDQyKoVtz+fOPNEusSubooJ+btpmTlRDOw2+irvSppPvAqQ+aQ1ag7ev0RXwmiQZN jHb2/3WArWLJ15OG4aHN3dA9kLdBoS2KaeKnXF8PA1AtO12FPltbzwtWzoH5VY5PdZOg f2rJLMo2SDF/NHNhu3TcSECw1K6qFD3rQihi+B+zzdxWL3rAyvk+Q5FvFQKQRkHVwlqN p6oH6GUB5CyXaO9dbivKlKBFOZij9eUQyKZsADl7Etp31MkhK2+gnqtvnChl2JzUD/2Y zkng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=fmZdneQWDCgPWVCN4l1MrlVfd9eErYyqZxv9x9yJNMU=; b=UIa4VA7B+SzpqysucWuBmCe6RI+gGWZoVMyUmHuQBErXreqtYdM69Nt0a2kJVbfxJT G/5PR2RpV5iZhIF2Kxc4kS/LIPmPDlrwucbF6fSgZAnSiblkzfr5rAM4e9o8j20jQcLl 70WmAaIgFgN9QP0ehop0A3mmA/28PbHbJ1kquSOLce8MLQy8GsYMhbKFS8TYgJ2jdNs6 JzXB1m+Jq35wG+qIRPxxv+N01vvBDyQA+a4+CW3lIh/StjXPcS3F3gRy3hHo0xuM6FB5 KWjb1AZ5YSWZ1p9oDz2VhZ7FIrT6pA3ABkhu+6SxbC01p17jW5MNzDMc/NRDN9sWxcqY WqEA== X-Gm-Message-State: AJcUukf9o1QyCjG5EuCDsLs2tk/t03ijKA3fF7Bks1AUbL2z6CH6nq5j 10d0RzzcE6GFXUaMxB4vM5pHvJcJtUJ331M89LF2HkSq/30= X-Google-Smtp-Source: AHgI3IYKaJaslVW/FtUTwy0abld5JvIUgF6cCuN3mCMBhxKCmFDHomU6FJWoLpjSuPjrGQLz8Z6O69y9bHCk34NK9n4= X-Received: by 2002:a5d:9b99:: with SMTP id r25mr1828766iom.180.1548194795452; Tue, 22 Jan 2019 14:06:35 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Wei ZHOU Date: Tue, 22 Jan 2019 23:06:24 +0100 Message-ID: Subject: Re: Snapshots on KVM corrupting disk images To: dev@cloudstack.apache.org Content-Type: multipart/alternative; boundary="000000000000d37f520580133072" --000000000000d37f520580133072 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Sean, The (recurring) volume snapshot on running vms should be disabled in cloudstack. According to some discussions (for example https://bugzilla.redhat.com/show_bug.cgi?id=3D920020), the image might be corrupted due to the concurrent read/write operations in volume snapshot (by qemu-img snapshot). ``` qcow2 images must not be used in read-write mode from two processes at the = same time. You can either have them opened either by one read-write process or b= y many read-only processes. Having one (paused) read-write process (the runni= ng VM) and additional read-only processes (copying out a snapshot with qemu-im= g) may happen to work in practice, but you're on your own and we won't give support for such attempts. ``` The safe way to take a volume snapshot of running vm is (1) take a vm snapshot (vm will be paused) (2) then create a volume snapshot from the vm snapshot -Wei Sean Lair =E4=BA=8E2019=E5=B9=B41=E6=9C=8822=E6=97= =A5=E5=91=A8=E4=BA=8C =E4=B8=8B=E5=8D=885:30=E5=86=99=E9=81=93=EF=BC=9A > Hi all, > > We had some instances where VM disks are becoming corrupted when using KV= M > snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7. > > The first time was when someone mass-enabled scheduled snapshots on a lot > of large number VMs and secondary storage filled up. We had to restore a= ll > those VM disks... But believed it was just our fault with letting > secondary storage fill up. > > Today we had an instance where a snapshot failed and now the disk image i= s > corrupted and the VM can't boot. here is the output of some commands: > > ----------------------- > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Could > not read snapshots: File too large > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Could > not read snapshots: File too large > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > -rw-r--r--. 1 root root 73G Jan 22 11:04 > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > ----------------------- > > We tried restoring to before the snapshot failure, but still have strange > errors: > > ---------------------- > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > -rw-r--r--. 1 root root 73G Jan 22 11:04 > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > file format: qcow2 > virtual size: 50G (53687091200 bytes) > disk size: 73G > cluster_size: 65536 > Snapshot list: > ID TAG VM SIZE DATE VM CLOCK > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > 3099:35:55.242 > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > 3431:52:23.942 > Format specific information: > compat: 1.1 > lazy refcounts: false > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > tcmalloc: large alloc 1539750010880 bytes =3D=3D (nil) @ 0x7fb9cbbf7bf3 > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc 0x55d16ddf254= 1 > 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 0x55d16de373e6 0x7fb9c63a3c0= 5 > 0x55d16ddd9f7d > No errors were found on the image. > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img snapshot > -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > Snapshot list: > ID TAG VM SIZE DATE VM CLOCK > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > 3099:35:55.242 > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > 3431:52:23.942 > -------------------------- > > Everyone is now extremely hesitant to use snapshots in KVM.... We tried > deleting the snapshots in the restored disk image, but it errors out... > > > Does anyone else have issues with KVM snapshots? We are considering just > disabling this functionality now... > > Thanks > Sean > > > > > > > --000000000000d37f520580133072--