cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Rudolph <>
Subject Re: AW: How to control location of data?
Date Tue, 10 Jan 2012 14:05:32 GMT

Thank you for your last reply. I'm still wondering if I got you right...

> ... 
> A partitioner decides into which partition a piece of data belongs
Does your statement imply that the partitioner does not take any decisions at all on the (physical)
storage location? Or put another way: What do you mean with "partition"?

To quote "... AbstractReplicationStrategy
controls what nodes get secondary, tertiary, etc. replicas of each key range. Primary replica
is always determined by the token ring (...)"

> ... 
> You can select different placement strategies and partitioners for different keyspaces,
thereby choosing known data to be stored on known hosts.
> This is however discouraged for various reasons – i.e.  you need a lot of knowledge
about your data to keep the cluster balanced. What is your usecase for this requirement? there
is probably a more suitable solution.
What we want is to partition the cluster with respect to key spaces.
That is we want to establish an association between nodes and key spaces so that a node of
the cluster holds data from a key space if and only if that node is a *member* of that key

To our knowledge Cassandra has no built-in way to specify such a membership-relation. Therefore
we thought of implementing our own replica placement strategy until we started to assume that
the partitioner had to be replaced, too, to accomplish the task.

Do you have any ideas?

> Von: Andreas Rudolph [] 
> Gesendet: Dienstag, 10. Januar 2012 09:53
> An:
> Betreff: How to control location of data?
> Hi!
> We're evaluating Cassandra for our storage needs. One of the key benefits we see is the
online replication of the data, that is an easy way to share data across nodes. But we have
the need to precisely control on what node group specific parts of a key space (columns/column
families) are stored on. Now we're having trouble understanding the documentation. Could anyone
help us with to find some answers to our questions?
> ·  What does the term "replica" mean: If a key is stored on exactly three nodes in a
cluster, is it correct then to say that there are three replicas of that key or are there
just two replicas (copies) and one original?
> ·  What is the relation between the Cassandra concepts "Partitioner" and "Replica Placement
Strategy"? According to documentation found on DataStax web site and architecture internals
from the Cassandra Wiki the first storage location of a key (and its associated data) is determined
by the "Partitioner" whereas additional storage locations are defined by "Replica Placement
Strategy". I'm wondering if I could completely redefine the way how nodes are selected to
store a key by just implementing my own subclass of AbstractReplicationStrategy and configuring
that subclass into the key space.
> ·  How can I suppress that the "Partitioner" is consulted at all to determine what node
stores a key first?
> ·  Is a key space always distributed across the whole cluster? Is it possible to configure
Cassandra in such a way that more or less freely chosen parts of a key space (columns) are
stored on arbitrarily chosen nodes?
> Any tips would be very appreciated :-)

View raw message