Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77464D3C6 for ; Fri, 7 Dec 2012 03:35:08 +0000 (UTC) Received: (qmail 30466 invoked by uid 500); 7 Dec 2012 03:35:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30302 invoked by uid 500); 7 Dec 2012 03:35:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30267 invoked by uid 99); 7 Dec 2012 03:35:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 03:35:04 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a50.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 03:34:56 +0000 Received: from homiemail-a50.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a50.g.dreamhost.com (Postfix) with ESMTP id 749686F8078 for ; Thu, 6 Dec 2012 19:34:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=l3xcw2f+Frg3vWyGLjkNrnTMw+ s=; b=0OcTofrxAakLtiLkMcq91abkjMueNUifGI6GHFo7aUL1DFpzhZibmMFUEM vigFS8VmDz+DPzXSGUd5d71y4kMRQRXJLfT8KM++W9HvDrcell4DT3q/pdHu410D qQ2lq+YTL54mwD2pUMp24RFcdo/Jss0S3akR3gJ6OcUl/b9pM= Received: from [172.20.10.2] (unknown [118.148.182.132]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a50.g.dreamhost.com (Postfix) with ESMTPSA id 60ED86F8074 for ; Thu, 6 Dec 2012 19:34:33 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_BEC65A9B-F06A-4B7E-82F5-256D40CE0511" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: how to take consistant snapshot? Date: Fri, 7 Dec 2012 16:34:30 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_BEC65A9B-F06A-4B7E-82F5-256D40CE0511 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 For background = http://wiki.apache.org/cassandra/Operations?highlight=3D%28snapshot%29#Con= sistent_backups If you it for a single node then yes there is a chance of inconsistency = across CF's.=20 If you have mulitple nodes the snashots you take on the later nodes will = help. If you use CL QUOURM for reads you *may* be ok (cannot work it out = quickly.). If you use CL ALL for reads you will be ok. Or you can use = nodetool repair to ensure the data is consistent.=20 I doubt that even using repair would give you a provable guarantee = though. Anyone ? Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 7:56 AM, Andrey Ilinykh wrote: > Hello, everybody! > I have production cluster with incremental backup on and I want to = clone it (create test one). I don't understand one thing- each column = family gets flushed (and copied to backup storage) independently. Which = means the total snapshot is inconsistent. If I restore from such = snapshot I have totally useless system. To be more specific, let's say = I have two CF, one serves as an index for another. Every time I update = one CF I update index CF. There is a good chance that all replicas flush = index CF first. Then I move it into backup storage, restore and get CF = which has pointers to non existent data in another CF. What is a way to = avoid this situation? >=20 > Thank you, > Andrey --Apple-Mail=_BEC65A9B-F06A-4B7E-82F5-256D40CE0511 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 For = background


If you = it for a single node then yes there is a chance of inconsistency across = CF's. 

If you have mulitple nodes the = snashots you take on the later nodes will help. If you use CL QUOURM for = reads you *may* be ok (cannot work it out quickly.). If you use CL ALL = for reads you will be ok. Or you can use nodetool repair to ensure the = data is consistent. 

I doubt that even = using repair would give you a provable guarantee though. Anyone = ?

Cheers

http://www.thelastpickle.com

On 6/12/2012, at 7:56 AM, Andrey Ilinykh <ailinykh@gmail.com> = wrote:

Hello, everybody!
I have production cluster = with incremental backup on and I want to clone it (create test = one). I don't understand one thing- each column family gets flushed (and = copied to backup storage) independently. Which means the total snapshot = is inconsistent. If I restore from such snapshot  I have totally = useless system. To be more specific, let's say I have two CF, one serves = as an index for another. Every time I update one CF I update index CF. = There is a good chance that all replicas flush index CF first. Then I = move it into backup storage, restore and get CF which has pointers to = non existent data in another CF. What is a way to avoid this = situation?

Thank you,
  Andrey

= --Apple-Mail=_BEC65A9B-F06A-4B7E-82F5-256D40CE0511--