Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 96216 invoked from network); 5 May 2010 02:21:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 02:21:58 -0000 Received: (qmail 20885 invoked by uid 500); 5 May 2010 02:21:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20861 invoked by uid 500); 5 May 2010 02:21:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20853 invoked by uid 99); 5 May 2010 02:21:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 02:21:56 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 02:21:49 +0000 Received: by gwaa12 with SMTP id a12so2076945gwa.31 for ; Tue, 04 May 2010 19:21:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.188.11 with SMTP id l11mr13112040ybf.197.1273026086669; Tue, 04 May 2010 19:21:26 -0700 (PDT) Received: by 10.150.185.4 with HTTP; Tue, 4 May 2010 19:21:26 -0700 (PDT) Date: Tue, 4 May 2010 22:21:26 -0400 Message-ID: Subject: Export to another cassandra cluster From: Joost Ouwerkerk To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org I want to export data from one cassandra cluster (production) to another (development). This is not a case of replication, because I just want a snapshot, not a continuous synchronization. I guess my options include 'nodetool snapshot' and 'sstable2json'. In our case, however, the development cluster has 10 nodes whereas the production cluster has 40. What's the recommended strategy for getting the data in one column family from one cluster to the other?