Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31F7C9932 for ; Mon, 5 Mar 2012 15:43:16 +0000 (UTC) Received: (qmail 46469 invoked by uid 500); 5 Mar 2012 15:43:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 46438 invoked by uid 500); 5 Mar 2012 15:43:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 46428 invoked by uid 99); 5 Mar 2012 15:43:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 15:43:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [216.129.106.114] (HELO zen.heyx.com) (216.129.106.114) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 15:43:04 +0000 Received: from pptp-230.corp.wink.com (64-71-1-165.static.wiline.com [64.71.1.165]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by zen.heyx.com (Postfix) with ESMTPSA id 500B94008A for ; Mon, 5 Mar 2012 07:42:43 -0800 (PST) Message-ID: <4F54DEF2.1080207@koblas.com> Date: Mon, 05 Mar 2012 07:42:42 -0800 From: David Koblas User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Adding a second datacenter Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Everything that I've read about data centers focuses on setting things up at the beginning of time. I've the the following situation: 10 machines in a datacenter (DC1), with replication factor of 2. I want to set up a second data center (DC2) with the following configuration: 20 machines with a replication factor of 4 What I've found is that if I initially start adding things, the first machine to join the network attempts to replicate all of the data from DC1 and fills up it's disk drive. I've played with setting the storage_options to have a replication factor of 0, then I can bring up all 20 machines in DC2 but then start getting a huge number of read errors from read on DC1. Is there a simple cookbook on how to add a second DC? I'm currently trying to set the replication factor to 1 and do a repair, but that doesn't feel like the right approach. Thanks,