Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0D849200BFA for ; Thu, 12 Jan 2017 14:05:34 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0C718160B2D; Thu, 12 Jan 2017 13:05:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 30331160B40 for ; Thu, 12 Jan 2017 14:05:33 +0100 (CET) Received: (qmail 56675 invoked by uid 500); 12 Jan 2017 13:05:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56665 invoked by uid 99); 12 Jan 2017 13:05:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jan 2017 13:05:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8D172C1D9C for ; Thu, 12 Jan 2017 13:05:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.38 X-Spam-Level: ** X-Spam-Status: No, score=2.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id NzlNwA5QrNVn for ; Thu, 12 Jan 2017 13:05:25 +0000 (UTC) Received: from mail-ua0-f180.google.com (mail-ua0-f180.google.com [209.85.217.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A3CC85FBE1 for ; Thu, 12 Jan 2017 13:05:25 +0000 (UTC) Received: by mail-ua0-f180.google.com with SMTP id 35so13545151uak.1 for ; Thu, 12 Jan 2017 05:05:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=+t67hQlhG3hIF2wT+29HTN7eAEI67kAT1iNP2dd1EqU=; b=VWPgcWMfiFvosFKSIO15jcUL8RTsH57hxesq3CbmWhrWCAWhpMeHleHeIj6PK3IRiH DZlE4Kc7rOW6qlVqXKfe0u4yyEx2eIPaPlERxIOnoMKyxP7GZ+nuMA0L5U8sPxKbXvbB d6jMSKwGBYuvKCdwVO4GHXAfsOX8VbHOMh2cwhtDn0qLJAl7PFkepOTC4/OSiMn7jLkL 39H0QVQ6DwKDC2sZ7czc0417+YDO1tUDmr8tWvOfLet7Tw5RB6v1R4z84WB8O2S2FRy+ x0dkoee0zpj3Zq7xVs+7dxl4z4yy7RAEjSFq0/U0bucqvi7Z102+OcEVB5eGRf21lTQu w5NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=+t67hQlhG3hIF2wT+29HTN7eAEI67kAT1iNP2dd1EqU=; b=kOq9hv00338hGqKmXP3MbqaFOiJVUQh0Qlp0W4tILzZy8HlQM615wxa9dKwx33a2f2 lQ9rgDn2GS7N2ZJpgTOWHzCqstAxI3+RrnAmdU9BQ4FhO+L8CKGFnWRWJ2QpZChEHgFm TTWt3eEX5MuwvKJc63xsfT6EYJz4zaZviMnezAnGX7wVIUuW7yZZlfT42X5CmlTC38+K h4mdILKdhEMYwW+AJ8IzMFXe7IXrtdyQuvmzFfSv6zAioBuZ0AGBYsXr21i1TuGZ6W89 ihn+EKgzcNN74MlCS3wEpXyx1qreurztWUKeLEKZJnNg72NaVyGX3D8uU+Bn7TgFyWvl 32Pg== X-Gm-Message-State: AIkVDXIZhv3HdbJ60rHe+SauNvP00yhKkD73hY19Uq5ZH3p6Ho3A6hTjg1gm9uia9k//1roL55wcP4IV133iTQ== X-Received: by 10.176.92.21 with SMTP id q21mr7357585uaf.130.1484226323791; Thu, 12 Jan 2017 05:05:23 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.121.135 with HTTP; Thu, 12 Jan 2017 05:05:03 -0800 (PST) In-Reply-To: References: From: Alain RODRIGUEZ Date: Thu, 12 Jan 2017 14:05:03 +0100 Message-ID: Subject: Re: Check snapshot / sstable integrity To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f403043612f0cbc3290545e55ed7 archived-at: Thu, 12 Jan 2017 13:05:34 -0000 --f403043612f0cbc3290545e55ed7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi J=C3=A9r=C3=B4me, About this concern: But my Op retains my arm and asks: "Are you sure that the snapshot is safe > and will be restored before truncating data we have?" Make sure to enable snapshot on truncate (cassandra.yaml) or do it manually. This way if the restored dataset is worst than the current one (the one you plan to truncate), you can always rollback this truncate / restore action. This way you can tell your "Op" that this is perfectly safe anyway, no data would be lost, even in the worst case scenario (not considering the downtime that would be induced). Plus this snapshot is cheap (hard links) and do not need to be moved around or kept once you are sure the old backup fits your need. Truncate is definitely the way to go before restoring a backup. Parsing the data to delete it all is not really an option imho. Then about the technical question "how to know that a snapshot is clean" it would be good to define "clean". You can make sure the backup is readable, consistent enough and correspond to what you want by inserting all the sstables into a testing cluster and performing some reads there before doing it in production. You can use for example AWS EC2 machines with big EBS attached or whatever and use the sstableloader to load data into it. If you are just worried about SSTables format validity, there is no tool I am aware of to check sstables well formatted but it might exist or be doable. An other option might be to do a checksum on each sstable before uploading it elsewhere and make sure it matches when downloaded back. That's the first things that come to my mind. Hope that is helpful. Hopefully, someone else will be able to point you to an existing tool to do this work. Cheers, ----------------------- Alain Rodriguez - @arodream - alain@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2017-01-12 11:33 GMT+01:00 J=C3=A9r=C3=B4me Mainaud : > Hello, > > Is there any tool to test the integrity of a snapshot? > > Suppose I have a snapshot based backup stored in an external low cost > storage system that I want to restore to a database after someone deleted > important data by mistake. > > Before restoring the files, I will truncate the table to remove the > problematic tombstones. > > But my Op retains my arm and asks: "Are you sure that the snapshot is saf= e > and will be restored before truncating data we have?" > > If this scenario is a theoretical, the question is good. How can I verify > that a snapshot is clean? > > Thank you, > > -- > J=C3=A9r=C3=B4me Mainaud > jerome@mainaud.com > --f403043612f0cbc3290545e55ed7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi J=C3=A9r=C3=B4me,

About this concern= :

But my Op retains my arm and asks: "Are = you sure that the snapshot is safe and will be restored before truncating d= ata we have?"

Make sure to enab= le snapshot on truncate (cassandra.yaml) or do it manually. This way if the= restored dataset is worst than the current one (the one you plan to trunca= te), you can always rollback this truncate / restore action. This way you c= an tell your "Op" that this is perfectly safe anyway, no data wou= ld be lost, even in the worst case scenario (not considering the downtime t= hat would be induced). Plus this snapshot is cheap (hard links) and do not = need to be moved around or kept once you are sure the old backup fits your = need.

Truncate is definitely the way to go before = restoring a backup. Parsing the data to delete it all is not really an opti= on imho.

Then about the technical question "how to know that a = snapshot is clean" it would be good to define "clean". You c= an make sure the backup is readable, consistent enough and correspond to wh= at you want by inserting all =C2=A0the sstables into a testing cluster and = performing some reads there before doing it in production. You can use for = example AWS EC2 machines with big EBS attached or whatever and use the ssta= bleloader to load data into it.=C2=A0

If you are j= ust worried about SSTables format validity, there is no tool I am aware of = to check sstables well formatted but it might exist or be doable. An other = option might be to do a checksum on each sstable before uploading it elsewh= ere and make sure it matches when downloaded back. That's the first thi= ngs that come to my mind.

Hope that is helpful. Ho= pefully, someone else will be able to point you to an existing tool to do t= his work.

Cheers,
-----------------= ------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

<= /div>
The Last Pickle - Apache Cassandra Consulting

2017-01-= 12 11:33 GMT+01:00 J=C3=A9r=C3=B4me Mainaud <jerome@mainaud.com>= :
=
Hello,

Is there any tool to test the integrity of a = snapshot?

Suppose I have a snapshot based backup stored in an = external low cost storage system that I want to restore to a database after= someone deleted important data by mistake.

Before restoring t= he files, I will truncate the table to remove the problematic tombstones.
But my Op retains my arm and asks: "Are you sure that the = snapshot is safe and will be restored before truncating data we have?"=

If this scenario is a theoretical, the question is good. How = can I verify that a snapshot is clean?

Thank you,

--
J=C3=A9r=C3=B4me Mainaud
jerome@mainaud.com
=

--f403043612f0cbc3290545e55ed7--