incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yudong Gao <>
Subject Re: Location-aware replication based on objects' access pattern
Date Wed, 06 Apr 2011 16:40:47 GMT
On Wed, Apr 6, 2011 at 3:55 AM, Sasha Dolgy <> wrote:
> I had been asked this question from a strategy point of view, and
> referenced how appears to handle this.
> <assumption>
> Specific region data is stored on a ring in that region.  While based
> in the middle east, my profile was kept on the middle
> east part of ... when I moved back to europe, updated my
> city, my profile shifted from the middle east to europe ...
> </assumption>
> would it not be easier to manage multiple rings (one in each required
> geographic region) to suit the location aware use case?  This way you
> can grow out that region as necessary and invest less into the regions
> that aren't as busy ...
> would mean your application needs to be aware of the different regions
> and where data exists ... or make some initial assumptions as to where
> to find data ...
> - 1 ring for apac
> - 1 ring for europe
> - 1 ring for americas
> - 1 global ring (with nodes present in each region)
> the global ring maintains reference data on which ring a guid exists ...
> I've been playing with this concept on AWS ... the amount of data I
> have isn't significant, so I may not have run into problems that will
> occur when i get to large amounts of data ...

This is interesting. But how do you design the global ring to make
sure that it is not the bottleneck? For example, if a client need to
access data in the US ring, but she need to first talk to a europe
node to get the reference data, this will not be efficient.

Another potential problem is that the data is not synchronized among
the rings. If one data center goes down, the data stored there will
get lost. One way to get around may be to use the
NetworkTopologyStrategy. For example, with RF=3, for the ring in
europe, we can specify 2 replicas in europe and 1 replica in america.



> -sd
> On Wed, Apr 6, 2011 at 9:26 AM, Jonathan Colby <> wrote:
>> good to see a discussion on this.
>> This also has practical use for business continuity where you can control that the
clients in a given data center first write replicas to its own data center, then to the other
data center for backup.  If I understand correctly, a write takes the token into account
first, then the replication strategy decides where the replicas go.   I would like to see
the the first writes to be based on "location" instead of token -   whether that is accomplished
by manipulating the key or some other mechanism.
>> That way, if you do suffer the loss of a data center,  the clients are guaranteed
to meet quorum on the nodes in its own data center  (given  a mirrored architecture across
2 data centers).
>> We have 2 data centers.  If one goes down we have the problem that quorum cannot
be satisfied for half of the reads.

View raw message