incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: understanding the cassandra storage scaling
Date Thu, 09 Dec 2010 11:06:25 GMT
> This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it
> not true that each node contains all the data in the cluster?

Not at all. Basically each node is responsible of only a part of the data (a
range really). But for each data you can choose on how many nodes it is; this
is the Replication Factor.

For instance, if you choose to have RF=1, then each piece of data will be on
exactly one node (this is usually a bad idea since it offers very weak
durability guarantees but nevertheless, it can be done).

If you choose RF=3, each piece of data is on 3 nodes (independently of the
number of nodes your cluster have). You can have all data on all node, but for
that you'll have to choose RF=#{nodes in the cluster}. But this is a very
degenerate case.

> how does my query get directed to the right node?

Each node in the cluster knows the ranges of data each other nodes hold. I
suggest you watch the first video linked in this page
  http://wiki.apache.org/cassandra/ArticlesAndPresentations
It explains this and more.

--
Sylvain

Mime
View raw message