incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin>
Subject Re: Deployment on AWS and replication strategies
Date Sun, 04 Apr 2010 03:12:07 GMT
On Sat, Apr 3, 2010 at 3:41 PM, Mike Gallamore
<> wrote:
> Useful things that nodes could advertise:
> data-centre they are in,

This is what the snitches do.

> performance info: mem, CPU etc (these could be used to more intelligently decide how
to partition the data that the new node gets for example)

Not convinced this is useful as it changes rapidly, so either causes
lots of gossip or is always out of date.  Better to use a real
monitoring system.

> geographical info


> perhaps a preferred hash range not just a token (and presumably everything else would
automatically rebalance itself to make that happen)

Unclear what this would do.

> P.S.The last two could be useful for someone if they had their data in Cassandra but
it was more relevant more local to the geography. Think of something like Craigslist. Having
the data corresponding to San Fransisco lists just happen to bootstrap over to a datacenter
on the east coast wouldn't be very efficient. But having two completely separate datastores
might not be the simplest design either. It would be nice to just tell the datastore where
the info is most relevant and have it make intelligent choices of where to store things for

Or just set the token specifically for each node you bootstrap.
Starting a node and crossing your fingers on its token selection is a
recipe for interesting times :)

>  In my case we are making a reputation system. It would be nice if we had a way to make
sure that at least one replica of the data stayed on the customers machine and one or more
copies over on our servers. I don't know how to do that and the reverse would be important
too make sure other customers data doesn't get replicated to another customers node. I guess
rather than a ring topology I'd like to try to get a star "everything in the center + location
specific info at the points". An option would be to use different datastores at both ends
and push updates over to the central store which would be Cassandra but that isn't as transparent
as just having Cassandra nodes everywhere and just have the replication happen in a smart

This is what placement strategies do.  Have a look at the
RackAwareStrategy, for example.


View raw message