cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Rudolph <>
Subject How to control location of data?
Date Tue, 10 Jan 2012 08:52:50 GMT

We're evaluating Cassandra for our storage needs. One of the key benefits we see is the online
replication of the data, that is an easy way to share data across nodes. But we have the need
to precisely control on what node group specific parts of a key space (columns/column families)
are stored on. Now we're having trouble understanding the documentation. Could anyone help
us with to find some answers to our questions?

What does the term "replica" mean: If a key is stored on exactly three nodes in a cluster,
is it correct then to say that there are three replicas of that key or are there just two
replicas (copies) and one original?
What is the relation between the Cassandra concepts "Partitioner" and "Replica Placement Strategy"?
According to documentation found on DataStax web site and architecture internals from the
Cassandra Wiki the first storage location of a key (and its associated data) is determined
by the "Partitioner" whereas additional storage locations are defined by "Replica Placement
Strategy". I'm wondering if I could completely redefine the way how nodes are selected to
store a key by just implementing my own subclass of AbstractReplicationStrategy and configuring
that subclass into the key space.
How can I suppress that the "Partitioner" is consulted at all to determine what node stores
a key first?
Is a key space always distributed across the whole cluster? Is it possible to configure Cassandra
in such a way that more or less freely chosen parts of a key space (columns) are stored on
arbitrarily chosen nodes?

Any tips would be very appreciated :-)
View raw message