Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 40836 invoked from network); 14 Apr 2011 20:47:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Apr 2011 20:47:59 -0000 Received: (qmail 55689 invoked by uid 500); 14 Apr 2011 20:47:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 55662 invoked by uid 500); 14 Apr 2011 20:47:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55654 invoked by uid 99); 14 Apr 2011 20:47:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2011 20:47:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of acockcroft@netflix.com designates 208.75.77.162 as permitted sender) Received: from [208.75.77.162] (HELO exout101.netflix.com) (208.75.77.162) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2011 20:47:50 +0000 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=s1024;d=netflix.com; h=from:to:cc:date:subject:message-id:references:in-reply-to :content-type:mime-version; bh=OEQrgy8c5QA+MykyuHFl+mC7QQE=; b=BxsQJ/d2xtbTpzETLHWxCKRtmDh6q4G5lxuU+nmdMi/ni030+sKBA00hzYJJA8BOSr/5ZwdJ tIaWcIrr6wgAI9b3zbm/aQPE6/2FTng4cZ89XjMLq7Jr8bPDd/EtAPi7YzgU01SyaTvQbFl5 arwQc7MMM9IjaYWgJDH6TzYSrYI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024;d=netflix.com; h=from:to:cc:date:subject:message-id:references:in-reply-to :content-type:mime-version; b=OghtmjrKwtA7YHyhUgDYhXzgGsP6/dfyVM6aeyF9blPhghGiOxCdBkh+CBHnMPVqemvNLnNN IDb45Lg9VL93Hy9SbGGK0xQQVJYnLo35gJ4JlL51ClPVLLjpv6sg74gAGqqLr5mo1LjgI3NG 4jxiuLNvwhZOu1rYlpzZHuVOrmY= Received: from EXCHFE101.netflix.com (10.64.32.101) by exout101.netflix.com (10.64.240.73) with Microsoft SMTP Server (TLS) id 8.3.137.0; Thu, 14 Apr 2011 13:47:40 -0700 Received: from EXCHMBX102.netflix.com ([fe80::e98a:9f90:8b2d:b28e]) by ExchFE101.netflix.com ([fe80::6861:6a26:831f:e8c9%14]) with mapi; Thu, 14 Apr 2011 13:47:29 -0700 From: Adrian Cockcroft To: Patrick Julien CC: "user@cassandra.apache.org" Date: Thu, 14 Apr 2011 13:47:26 -0700 Subject: Re: Pyramid Organization of Data Thread-Topic: Pyramid Organization of Data Thread-Index: Acv65SshSq/pPhJcRcCuv13+JUkmMA== Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org What you are asking for breaks the eventual consistency model, so you need = to create a separate cluster in NYC that collects the same updates but has = a much longer setting to timeout the data for deletion, or doesn't get the = deletes.=20 One way is to have a trigger on writes on your pyramid nodes in NY that cop= ies data over to the long term analysis cluster. The two clusters won't be = eventually consistent in the presence of failures, but with RF=3D3 you will= get up to three triggers for each write, so you get three chances to get t= he copy done.=20 Adrian On Apr 14, 2011, at 10:18 AM, "Patrick Julien" wrote: > Thanks for your input Adrian, we've pretty much settled on this too. > What I'm trying to figure out is how we do deletes. >=20 > We want to do deletes in the satellites because: >=20 > a) we'll run out of disk space very quickly with the amount of data we ha= ve > b) we don't need more than 3 days worth of history in the satellites, > we're currently planning for 7 days of capacity >=20 > However, the deletes will get replicated back to NY. In NY, we don't > want that, we want to run hadoop/pig over all that data dating back to > several months/years. Even if we set the replication factor of the > satellites to 1 and NY to 3, we'll run out of space very quickly in > the satellites. >=20 >=20 > On Thu, Apr 14, 2011 at 11:23 AM, Adrian Cockcroft > wrote: >> We have similar requirements for wide area backup/archive at Netflix. >> I think what you want is a replica with RF of at least 3 in NY for all t= he >> satellites, then each satellite could have a lower RF, but if you want s= afe >> local quorum I would use 3 everywhere. >> Then NY is the sum of all the satellites, so that makes most use of the = disk >> space. >> For archival storage I suggest you use snapshots in NY and save compress= ed >> tar files of each keyspace in NY. We've been working on this to allow fu= ll >> and incremental backup and restore from our EC2 hosted Cassandra cluster= s >> to/from S3. Full backup/restore works fine, incremental and per-keyspace >> restore is being worked on. >> Adrian >> From: Patrick Julien >> Reply-To: "user@cassandra.apache.org" >> Date: Thu, 14 Apr 2011 05:38:54 -0700 >> To: "user@cassandra.apache.org" >> Subject: Re: Pyramid Organization of Data >>=20 >> Thanks, I'm still working the problem so anything I find out I will pos= t >> here. >>=20 >> Yes, you're right, that is the question I am asking. >>=20 >> No, adding more storage is not a solution since new york would have seve= ral >> hundred times more storage. >>=20 >> On Apr 14, 2011 6:38 AM, "aaron morton" wrote: >>> I think your question is "NY is the archive, after a certain amount of >>> time we want to delete the row from the original DC but keep it in the >>> archive in NY." >>>=20 >>> Once you delete a row, it's deleted as far as the client is concerned. >>> GCGaceSeconds is only concerned with when the tombstone marker can be >>> removed. If NY has a replica of a row from Tokyo and the row is deleted= in >>> either DC, it will be deleted in the other DC as well. >>>=20 >>> Some thoughts... >>> 1) Add more storage in the satellite DC's, then tilt you chair to >>> celebrate a job well done :) >>> 2) Run two clusters as you say. >>> 3) Just thinking out loud, and I know this does not work now. Would it = be >>> possible to support per CF strategy options, so an archive CF only >>> replicates to NY ? Can think of possible problems with repair and >>> LOCAL_QUORUM, out of interest what else would it break? >>>=20 >>> Hope that helps. >>> Aaron >>>=20 >>>=20 >>>=20 >>> On 14 Apr 2011, at 10:17, Patrick Julien wrote: >>>=20 >>>> We have been successful in implementing, at scale, the comments you >>>> posted here. I'm wondering what we can do about deleting data >>>> however. >>>>=20 >>>> The way I see it, we have considerably more storage capacity in NY, >>>> but not in the other sites. Using this technique here, it occurs to >>>> me that we would replicate non-NY deleted rows back to NY. Is there a >>>> way to tell NY not to tombstone rows? >>>>=20 >>>> The ideas I have so far: >>>>=20 >>>> - Set GCGracePeriod to be much higher in NY than in the other sites. >>>> This way we can get to tombstone'd rows well beyond their disk life in >>>> other sites. >>>> - A variant on this solution is to set the TTL on rows in non NY sites >>>> and again, set the GCGracePeriod to be considerably higher in NY >>>> - break this up to multiple clusters and do one write from the client >>>> to the its 'local' cluster and one write to the NY cluster. >>>>=20 >>>>=20 >>>>=20 >>>> On Fri, Apr 8, 2011 at 7:15 PM, Jonathan Ellis wro= te: >>>>> No, I'm suggesting you have a Tokyo keyspace that gets replicated as >>>>> {Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London: >>>>> 2, NYC: 1}, for example. >>>>>=20 >>>>> On Fri, Apr 8, 2011 at 5:59 PM, Patrick Julien >>>>> wrote: >>>>>> I'm familiar with this material. I hadn't thought of it from this >>>>>> angle but I believe what you're suggesting is that the different dat= a >>>>>> centers would hold a different properties file for node discovery >>>>>> instead of using auto-discovery. >>>>>>=20 >>>>>> So Tokyo, and others, would have a configuration that make it >>>>>> oblivious to the non New York data centers. >>>>>> New York would have a configuration that would give it knowledge of = no >>>>>> other data center. >>>>>>=20 >>>>>> Would that work? Wouldn't the NY data center wonder where these othe= r >>>>>> writes are coming from? >>>>>>=20 >>>>>> On Fri, Apr 8, 2011 at 6:38 PM, Jonathan Ellis >>>>>> wrote: >>>>>>> On Fri, Apr 8, 2011 at 12:17 PM, Patrick Julien >>>>>>> wrote: >>>>>>>> The problem is this: we would like the historical data from Tokyo = to >>>>>>>> stay in Tokyo and only be replicated to New York. The one in Londo= n >>>>>>>> to be in London and only be replicated to New York and so on for a= ll >>>>>>>> data centers. >>>>>>>>=20 >>>>>>>> Is this currently possible with Cassandra? I believe we would need= to >>>>>>>> run multiple clusters and migrate data manually from data centers = to >>>>>>>> North America to achieve this. Also, any suggestions would also be >>>>>>>> welcomed. >>>>>>>=20 >>>>>>> NetworkTopologyStrategy allows configuration replicas per-keyspace, >>>>>>> per-datacenter: >>>>>>>=20 >>>>>>> http://www.datastax.com/dev/blog/deploying-cassandra-across-multipl= e-data-centers >>>>>>>=20 >>>>>>> -- >>>>>>> Jonathan Ellis >>>>>>> Project Chair, Apache Cassandra >>>>>>> co-founder of DataStax, the source for professional Cassandra suppo= rt >>>>>>> http://www.datastax.com >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>> -- >>>>> Jonathan Ellis >>>>> Project Chair, Apache Cassandra >>>>> co-founder of DataStax, the source for professional Cassandra support >>>>> http://www.datastax.com >>>>>=20 >>>=20 >>=20 >=20