Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D26AD3678 for ; Thu, 28 Apr 2011 21:15:53 +0000 (UTC) Received: (qmail 61169 invoked by uid 500); 28 Apr 2011 21:15:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61137 invoked by uid 500); 28 Apr 2011 21:15:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61129 invoked by uid 99); 28 Apr 2011 21:15:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 21:15:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adrian.cockcroft@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 21:15:46 +0000 Received: by bwz13 with SMTP id 13so3068327bwz.31 for ; Thu, 28 Apr 2011 14:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=sMfvnCQcuYZBRCoNQF0kthEifdd/37SVJlE+bAtVFeA=; b=JoILHNBIz5mBf4VnCZsvVHPHZ7Bwa9Yy7ndXMwu5QwSYWGWvajB+9f+sGUAHuhSJgc ESGMcbva1i8EEwiJPeaKxEpUdwTD/x4aY5e1So2slVeCG/k022RHHGhE6KrdvWWTCnuv ZMThAO/KraEe00VhI3G2+2ai88WMx3yEKpQek= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ph4aBADiOkkZKurvRAbUNgqfP2RmJoJ8m4DF8L60LSo6F40i9uPo0siW0y0CHzwfzQ r2vbWoUvSjcCHsW+ubAwsehdBNl34hNdhgdUuWj3NR5ozLqJivUL8iLbPBktveWHCRoa EnOh4fPZxfjWcgS8IdhRQrws8SZHIXaisrKqU= MIME-Version: 1.0 Received: by 10.204.19.10 with SMTP id y10mr2101570bka.190.1304025325718; Thu, 28 Apr 2011 14:15:25 -0700 (PDT) Received: by 10.204.101.209 with HTTP; Thu, 28 Apr 2011 14:15:25 -0700 (PDT) In-Reply-To: References: Date: Thu, 28 Apr 2011 14:15:25 -0700 Message-ID: Subject: Re: best way to backup From: Adrian Cockcroft To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Netflix has also gone down this path, we run a regular full backup to S3 of a compressed tar, and we have scripts that restore everything into the right place on a different cluster (it needs the same node count). We also pick up the SSTables as they are created, and drop them in S3. Whatever you do, make sure you have a regular process to restore the data and verify that it contains what you think it should... Adrian On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna wrote: > one thing we're looking at doing is watching the cassandra data directory= and backing up the sstables to s3 when they are created. =A0Some guys at s= implegeo started tablesnap that does this: > https://github.com/simplegeo/tablesnap > > What it does is for every sstable that is pushed to s3, it also copies a = json file with the current files in the directory, so you can know what to = restore in that event (as far as I understand). > > On Apr 28, 2011, at 2:53 PM, William Oberman wrote: > >> Even with N-nodes for redundancy, I still want to have backups. =A0I'm a= n amazon person, so naturally I'm thinking S3. =A0Reading over the docs, an= d messing with nodeutil, it looks like each new snapshot contains the previ= ous snapshot as a subset (and I've read how cassandra uses hard links to av= oid excessive disk use). =A0When does that pattern break down? >> >> I'm basically debating if I can do a "rsync" like backup, or if I should= do a compressed tar backup. =A0And I obviously want multiple points in tim= e. =A0S3 does allow file versioning, if a file or file name is changed/resu= sed over time (only matters in the rsync case). =A0My only concerns with co= mpressed tars is I'll have to have free space to create the archive and I g= et no "delta" space savings on the backup (the former is solved by not allo= wing the disk space to get so low and/or adding more nodes to bring down th= e space, the latter is solved by S3 being really cheap anyways). >> >> -- >> Will Oberman >> Civic Science, Inc. >> 3030 Penn Avenue., First Floor >> Pittsburgh, PA 15201 >> (M) 412-480-7835 >> (E) oberman@civicscience.com > >