Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 84F733052 for ; Thu, 28 Apr 2011 20:36:15 +0000 (UTC) Received: (qmail 75751 invoked by uid 500); 28 Apr 2011 20:36:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 75726 invoked by uid 500); 28 Apr 2011 20:36:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 75718 invoked by uid 99); 28 Apr 2011 20:36:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 20:36:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 20:36:06 +0000 Received: by yxk30 with SMTP id 30so1149047yxk.31 for ; Thu, 28 Apr 2011 13:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to :x-mailer; bh=IozQyWJnllGKwIqq1IY5GoqC6aRQAhCrElv8eYF+fsE=; b=qeO+wc6GxJTRTDcAKSmdOkE1qJuegWrPeOkwzJh6zNVlNzxck7OGVxVMgC2oJ28xAH KknXc4W2SrS9od8HpG3XvzfOiONHd/nzbv73b8zn1MqFAGeJMo7dfbt0u0jfE/e0EQHk hpi+/XukVGICp9/3G7rHfQCAX7N8h5ahWDHxM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=q2v0UkvrXbMkysxC473Uqs6rpY3SEzhbM+LJ5iUsNBZ6yTVwaqE6QSj4iN6dr6BYhn brF9gaTTWFyD2fWe/5CK7VUyIHD0m0BXKG9YbLc80uJZblma86tC9AyObHtncg+Vw95T 22OaO1IApFOyl/eTEcGdFZpNGPxxTlsypaMkc= Received: by 10.150.159.6 with SMTP id h6mr2959310ybe.344.1304022945755; Thu, 28 Apr 2011 13:35:45 -0700 (PDT) Received: from [172.26.242.159] ([207.71.25.99]) by mx.google.com with ESMTPS id m12sm1453783ybn.12.2011.04.28.13.35.43 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 28 Apr 2011 13:35:43 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: best way to backup From: Jeremy Hanna In-Reply-To: Date: Thu, 28 Apr 2011 15:35:42 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org one thing we're looking at doing is watching the cassandra data = directory and backing up the sstables to s3 when they are created. Some = guys at simplegeo started tablesnap that does this: https://github.com/simplegeo/tablesnap What it does is for every sstable that is pushed to s3, it also copies a = json file with the current files in the directory, so you can know what = to restore in that event (as far as I understand). On Apr 28, 2011, at 2:53 PM, William Oberman wrote: > Even with N-nodes for redundancy, I still want to have backups. I'm = an amazon person, so naturally I'm thinking S3. Reading over the docs, = and messing with nodeutil, it looks like each new snapshot contains the = previous snapshot as a subset (and I've read how cassandra uses hard = links to avoid excessive disk use). When does that pattern break down? =20= >=20 > I'm basically debating if I can do a "rsync" like backup, or if I = should do a compressed tar backup. And I obviously want multiple points = in time. S3 does allow file versioning, if a file or file name is = changed/resused over time (only matters in the rsync case). My only = concerns with compressed tars is I'll have to have free space to create = the archive and I get no "delta" space savings on the backup (the former = is solved by not allowing the disk space to get so low and/or adding = more nodes to bring down the space, the latter is solved by S3 being = really cheap anyways). >=20 > --=20 > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) oberman@civicscience.com