cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Julien <pjul...@gmail.com>
Subject Re: Pyramid Organization of Data
Date Wed, 13 Apr 2011 22:17:20 GMT
We have been successful in implementing, at scale, the comments you
posted here.  I'm wondering what we can do about deleting data
however.

The way I see it, we have considerably more storage capacity in NY,
but not in the other sites.  Using this technique here, it occurs to
me that we would replicate non-NY deleted rows back to NY.  Is there a
way to tell NY not to tombstone rows?

The ideas I have so far:

- Set GCGracePeriod to be much higher in NY than in the other sites.
This way we can get to tombstone'd rows well beyond their disk life in
other sites.
- A variant on this solution is to set the TTL on rows in non NY sites
and again, set the GCGracePeriod to be considerably higher in NY
- break this up to multiple clusters and do one write from the client
to the its 'local' cluster and one write to the NY cluster.



On Fri, Apr 8, 2011 at 7:15 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> No, I'm suggesting you have a Tokyo keyspace that gets replicated as
> {Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London:
> 2, NYC: 1}, for example.
>
> On Fri, Apr 8, 2011 at 5:59 PM, Patrick Julien <pjulien@gmail.com> wrote:
>> I'm familiar with this material.  I hadn't thought of it from this
>> angle but I believe what you're suggesting is that the different data
>> centers would hold a different properties file for node discovery
>> instead of using auto-discovery.
>>
>> So Tokyo, and others, would have a configuration that make it
>> oblivious to the non New York data centers.
>> New York would have a configuration that would give it knowledge of no
>> other data center.
>>
>> Would that work?  Wouldn't the NY data center wonder where these other
>> writes are coming from?
>>
>> On Fri, Apr 8, 2011 at 6:38 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> On Fri, Apr 8, 2011 at 12:17 PM, Patrick Julien <pjulien@gmail.com> wrote:
>>>> The problem is this: we would like the historical data from Tokyo to
>>>> stay in Tokyo and only be replicated to New York.  The one in London
>>>> to be in London and only be replicated to New York and so on for all
>>>> data centers.
>>>>
>>>> Is this currently possible with Cassandra?  I believe we would need to
>>>> run multiple clusters and migrate data manually from data centers to
>>>> North America to achieve this.  Also, any suggestions would also be
>>>> welcomed.
>>>
>>> NetworkTopologyStrategy allows configuration replicas per-keyspace,
>>> per-datacenter:
>>> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message