incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Fast <t...@digitalexistence.com>
Subject Re: abusing cassandra's multi DC abilities
Date Tue, 25 Feb 2014 04:49:05 GMT
Hi Jonathan--

First, best wishes for success with your platform.

Frankly, I think the architecture you described is only going to cause
you major trouble. I'm left wondering why you don't either use something
like XMPP (of which several implementations can handle this kind of
federated scenario) or simply have internal (REST) APIs to send a message
from the backend in one DC to the backend in another DC.

There are a bunch of ways to approach this problem: You could also use
Redis pubsub (though a bit brittle), SQS, or any number of other approaches
that would be simpler and more robust than what you described. I'd urge you
to really consider another approach.

Best,
Todd

On Saturday, February 22, 2014, Jonathan Haddad <jon@jonhaddad.com> wrote:

> Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
> changed data from DC1 shows up in DC2.
>
> Full Story:
> We're planning on adding data centers throughout the US.  Our platform is
> used for business communications.  Each DC currently utilizes elastic
> search and redis.  A message can be sent from one user to another, and the
> intent is that it would be seen in near-real-time.  This means that 2
> people may be using different data centers, and the messages need to
> propagate from one to the other.
>
> On the plus side, we know we get this with Cassandra (fist pump) but the
> other pieces, not so much.  Even if they did work, there's all sorts of
> race conditions that could pop up from having different pieces of our
> architecture communicating over different channels.  From this, we've
> arrived at the idea that since Cassandra is the authoritative data source,
> we might be able to trigger events in DC2 based on activity coming through
> either the commit log or some other means.  One idea was to use a CF with a
> low gc time as a means of transporting messages between DCs, and watching
> the commit logs for deletes to that CF in order to know when we need to do
> things like reindex a document (or a new document), bust cache, etc.
>  Facebook did something similar with their modifications to MySQL to
> include cache keys in the replication log.
>
> Assuming this is sane, I'd want to avoid having the same event register on
> 3 servers, thus registering 3 items in the queue when only one should be
> there.  So, for any piece of data replicated from the other DC, I'd need a
> way to determine if it was supposed to actually trigger the event or not.
>  (Maybe it looks at the token and determines if the current server falls in
> the token range?)  Or is there a better way?
>
> So, my questions to all ye Cassandra users:
>
> 1. Is this is even sane?
> 2. Is anyone doing it?
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Mime
View raw message