cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Lowe <richard.l...@arkivum.com>
Subject RE: default required in cassandra-topology.properties?
Date Fri, 20 Apr 2012 07:32:59 GMT
As far as I know it's not possible to leave replication factor undefined - if you do then Cassandra
will default to RF=1 with SimpleStrategy.

The topology is local to each node, so unless all your nodes have the same topology file then
it's possible for them each to have a different idea about the topology of the cluster.

I'm not sure what you're trying to achieve here, so I'll give an example.

Say you have two datacenters, DC1 and DC2. It's perfectly possible for nodes in DC1 to have
a topology file that only mentions DC1 nodes and nodes in DC2 to have a topology file that
only mentions DC2 nodes. You can then define one keyspace with strategy options DC1: 3 and
another with DC2: 3 and this should work fine.

However if you had a keyspace with strategy options DC1: 3, DC2: 3 then you would AFAIK never
be able to write to that column family because none of the nodes know enough about the topology;
they can either address DC1, or address DC2, but not both.

If there were a third type of node that had topology defined for both DC1 and DC2 then these
nodes would then be able to update the DC1+DC2 keyspace, even though DC1-only and DC2-only
nodes would not.

So if there is a clear segregation in your data then splitting the topology may be OK, but
if not then you will likely find that you can't update the keyspace unless a node has sufficient
knowledge of the topology.

Depending on your use case a simpler alternative may be to just run two clusters instead of
trying to define the shape of a single one through topology definitions. I think what you're
talking about here is on the edge of what Cassandra is designed to do; it works best when
all nodes are uniform and have the same understanding about the cluster.

Richard


From: Bill Au [mailto:bill.w.au@gmail.com]
Sent: 19 April 2012 19:58
To: user@cassandra.apache.org
Subject: Re: default required in cassandra-topology.properties?

I had thought that the topology file is used for replicas placement only such that for the
token range that the unknown node is responsible for, data is still read and write there.
 It just won't be replicated since replication factor is not defined.

Bill
On Thu, Apr 19, 2012 at 1:18 PM, Richard Lowe <richard.lowe@arkivum.com<mailto:richard.lowe@arkivum.com>>
wrote:
Yes it is possible. Put the following as the last line of your topology file:

default=unknown:unknown

So long as you don't have any DC or rack with this name your local node will not be able to
address any nodes that aren't explicitly given in its topology file.

However bear in mind that, whilst Cassandra won't try to use replication factor to store to
these 'unknown' nodes, their token may mean that the 'natural' home for a row is on a node
that is not addressable. This can create holes in your dataset and create situations where
data can 'disappear' because the bloom filter says the data is on a particular node (due to
its token) but the coordinator can't contact that node to get at the data.

Careful use of replication factor and NetworkTopologyStrategy can help with this, but you
should make sure that a node really doesn't need to contact the unknown nodes before marking
them as such.


Richard


From: Bill Au [mailto:bill.w.au@gmail.com<mailto:bill.w.au@gmail.com>]
Sent: 19 April 2012 17:16
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: default required in cassandra-topology.properties?

All the examples of cassandra-topology.properties that I have seen have a default entry assigning
unknown nodes to a specific data center and rack.  Is it possible to have Cassandra ignore
unknown nodes for the purpose of replication?

Bill


Mime
View raw message