cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FAQ" by PeterSchuller
Date Fri, 25 Mar 2011 23:01:50 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by PeterSchuller.
The comment on this change is: Add "How does Cassandra decide which nodes have what data?".
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=106&rev2=107

--------------------------------------------------

   * [[#cleaning_compacted_tables|I compacted, so why did space used not decrease?]]
   * [[#mmap|Why does top report that Cassandra is using a lot more memory than the Java heap
max?]]
   * [[#jna|I'm getting java.io.IOException: Cannot run program "ln" when trying to snapshot
or update a keyspace]]
+  * [[#replicaplacement|How does Cassandra decide which nodes have what data?]]
  <<Anchor(cant_listen_on_ip_any)>>
  
  == Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)? ==
@@ -402, +403 @@

  == I'm getting java.io.IOException: Cannot run program "ln" when trying to snapshot or update
a keyspace ==
  Updating a keyspace first takes a snapshot. This involves creating hardlinks to the existing
SSTables, but Java has no native way to create hard links, so it must fork "ln". When forking,
there must be as much memory free as the parent process, even though the child isn't going
to use it all.  Because Java is a large process, this is problematic.  The solution is to
install [[http://jna.java.net/|Java Native Access]] so it can create the hard links itself.
  
+ <<Anchor(replicaplacement)>>
+ 
+ == How does Cassandra decide which nodes have what data? ==
+ 
+ The set of nodes (a single node, or several) responsible for any given piece of data is
determined by:
+ 
+  * The row key (data is partitioned on row key)
+  * The replication factor (decides <em>how many</em> nodes are in the replica
set for a given row)
+  * The replication strategy (decides <em>which</em> nodes are part of said replica
set)
+ 
+ In the case of the SimpleStrategy, replicas are placed on succeeding nodes in the ring.
The first node is determined by the partitioner and the row key, and the remainder are placed
on succeeding node. In the case of NetworkTopologyStrategy placement is affected by data-center
and wrack awareness, and the placement will depend on how nodes in different racks or data
centers are placed in the ring.
+ 
+ It is important to understand that Cassandra <em>does not</em> alter the replica
set for a given row key based on changing characteristics like current load, which nodes are
up or down, or which node your client happens to talk to.
+ 

Mime
View raw message