incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Julien <pjul...@gmail.com>
Subject Re: Pyramid Organization of Data
Date Thu, 14 Apr 2011 12:38:54 GMT
Thanks,  I'm still working the problem so anything I find out I will post
here.

Yes, you're right, that is the question I am asking.

No, adding more storage is not a solution since new york would have several
hundred times more storage.
On Apr 14, 2011 6:38 AM, "aaron morton" <aaron@thelastpickle.com> wrote:
> I think your question is "NY is the archive, after a certain amount of
time we want to delete the row from the original DC but keep it in the
archive in NY."
>
> Once you delete a row, it's deleted as far as the client is concerned.
GCGaceSeconds is only concerned with when the tombstone marker can be
removed. If NY has a replica of a row from Tokyo and the row is deleted in
either DC, it will be deleted in the other DC as well.
>
> Some thoughts...
> 1) Add more storage in the satellite DC's, then tilt you chair to
celebrate a job well done :)
> 2) Run two clusters as you say.
> 3) Just thinking out loud, and I know this does not work now. Would it be
possible to support per CF strategy options, so an archive CF only
replicates to NY ? Can think of possible problems with repair and
LOCAL_QUORUM, out of interest what else would it break?
>
> Hope that helps.
> Aaron
>
>
>
> On 14 Apr 2011, at 10:17, Patrick Julien wrote:
>
>> We have been successful in implementing, at scale, the comments you
>> posted here. I'm wondering what we can do about deleting data
>> however.
>>
>> The way I see it, we have considerably more storage capacity in NY,
>> but not in the other sites. Using this technique here, it occurs to
>> me that we would replicate non-NY deleted rows back to NY. Is there a
>> way to tell NY not to tombstone rows?
>>
>> The ideas I have so far:
>>
>> - Set GCGracePeriod to be much higher in NY than in the other sites.
>> This way we can get to tombstone'd rows well beyond their disk life in
>> other sites.
>> - A variant on this solution is to set the TTL on rows in non NY sites
>> and again, set the GCGracePeriod to be considerably higher in NY
>> - break this up to multiple clusters and do one write from the client
>> to the its 'local' cluster and one write to the NY cluster.
>>
>>
>>
>> On Fri, Apr 8, 2011 at 7:15 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> No, I'm suggesting you have a Tokyo keyspace that gets replicated as
>>> {Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London:
>>> 2, NYC: 1}, for example.
>>>
>>> On Fri, Apr 8, 2011 at 5:59 PM, Patrick Julien <pjulien@gmail.com>
wrote:
>>>> I'm familiar with this material. I hadn't thought of it from this
>>>> angle but I believe what you're suggesting is that the different data
>>>> centers would hold a different properties file for node discovery
>>>> instead of using auto-discovery.
>>>>
>>>> So Tokyo, and others, would have a configuration that make it
>>>> oblivious to the non New York data centers.
>>>> New York would have a configuration that would give it knowledge of no
>>>> other data center.
>>>>
>>>> Would that work? Wouldn't the NY data center wonder where these other
>>>> writes are coming from?
>>>>
>>>> On Fri, Apr 8, 2011 at 6:38 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>> On Fri, Apr 8, 2011 at 12:17 PM, Patrick Julien <pjulien@gmail.com>
wrote:
>>>>>> The problem is this: we would like the historical data from Tokyo
to
>>>>>> stay in Tokyo and only be replicated to New York. The one in London
>>>>>> to be in London and only be replicated to New York and so on for
all
>>>>>> data centers.
>>>>>>
>>>>>> Is this currently possible with Cassandra? I believe we would need
to
>>>>>> run multiple clusters and migrate data manually from data centers
to
>>>>>> North America to achieve this. Also, any suggestions would also be
>>>>>> welcomed.
>>>>>
>>>>> NetworkTopologyStrategy allows configuration replicas per-keyspace,
>>>>> per-datacenter:
>>>>>
http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
>>>>>
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of DataStax, the source for professional Cassandra support
>>>>> http://www.datastax.com
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>

Mime
View raw message