Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 61720 invoked from network); 13 Apr 2011 22:17:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Apr 2011 22:17:48 -0000 Received: (qmail 27955 invoked by uid 500); 13 Apr 2011 22:17:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 27925 invoked by uid 500); 13 Apr 2011 22:17:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27917 invoked by uid 99); 13 Apr 2011 22:17:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2011 22:17:46 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pjulien@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2011 22:17:41 +0000 Received: by vws12 with SMTP id 12so1066531vws.31 for ; Wed, 13 Apr 2011 15:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=kKSSUZrGPglbYWFD+xPSE3MjIDhqnTxoCWakT7GGYIo=; b=JCDMOhRxjyv29l2/zOAw1FYrYHCjlffAwoHYVHV79GvQgbhElGz2nqWz3hWnxayG0r XpODhR6Eg6dQOuB2ZrCoU3gfFjMeL4ruibo6rcUKF9hycWfYXt1JHGf2NTehezWZc6Eg Ud+zdGA8JYVcCU/oRlJ0TEyTrowsmEb0itmlI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=tHN8UqzQO3jKyXSRml1h0+k8oOHwvwAGzh/k0fzVSz12XZzNLNJfz9VSMKkge/gcrv fG8zITSLqptlxivUWDH/odKpux5GaZIRQLjDA+l/cTI1XYEzTnXedXxKXqSG2HDj3Mo2 aragu+ZkbfC0Dv4brg8irwHTWi/dXt1npkgnw= MIME-Version: 1.0 Received: by 10.52.67.105 with SMTP id m9mr17981vdt.126.1302733040476; Wed, 13 Apr 2011 15:17:20 -0700 (PDT) Received: by 10.52.168.1 with HTTP; Wed, 13 Apr 2011 15:17:20 -0700 (PDT) In-Reply-To: References: Date: Wed, 13 Apr 2011 18:17:20 -0400 Message-ID: Subject: Re: Pyramid Organization of Data From: Patrick Julien To: Jonathan Ellis Cc: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable We have been successful in implementing, at scale, the comments you posted here. I'm wondering what we can do about deleting data however. The way I see it, we have considerably more storage capacity in NY, but not in the other sites. Using this technique here, it occurs to me that we would replicate non-NY deleted rows back to NY. Is there a way to tell NY not to tombstone rows? The ideas I have so far: - Set GCGracePeriod to be much higher in NY than in the other sites. This way we can get to tombstone'd rows well beyond their disk life in other sites. - A variant on this solution is to set the TTL on rows in non NY sites and again, set the GCGracePeriod to be considerably higher in NY - break this up to multiple clusters and do one write from the client to the its 'local' cluster and one write to the NY cluster. On Fri, Apr 8, 2011 at 7:15 PM, Jonathan Ellis wrote: > No, I'm suggesting you have a Tokyo keyspace that gets replicated as > {Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London: > 2, NYC: 1}, for example. > > On Fri, Apr 8, 2011 at 5:59 PM, Patrick Julien wrote: >> I'm familiar with this material. =A0I hadn't thought of it from this >> angle but I believe what you're suggesting is that the different data >> centers would hold a different properties file for node discovery >> instead of using auto-discovery. >> >> So Tokyo, and others, would have a configuration that make it >> oblivious to the non New York data centers. >> New York would have a configuration that would give it knowledge of no >> other data center. >> >> Would that work? =A0Wouldn't the NY data center wonder where these other >> writes are coming from? >> >> On Fri, Apr 8, 2011 at 6:38 PM, Jonathan Ellis wrote= : >>> On Fri, Apr 8, 2011 at 12:17 PM, Patrick Julien wro= te: >>>> The problem is this: we would like the historical data from Tokyo to >>>> stay in Tokyo and only be replicated to New York. =A0The one in London >>>> to be in London and only be replicated to New York and so on for all >>>> data centers. >>>> >>>> Is this currently possible with Cassandra? =A0I believe we would need = to >>>> run multiple clusters and migrate data manually from data centers to >>>> North America to achieve this. =A0Also, any suggestions would also be >>>> welcomed. >>> >>> NetworkTopologyStrategy allows configuration replicas per-keyspace, >>> per-datacenter: >>> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-da= ta-centers >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >