cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FAQ" by JonathanEllis
Date Mon, 14 Jun 2010 13:22:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by JonathanEllis.
The comment on this change is: mutations against a single key are atomic.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=73&rev2=74

--------------------------------------------------

  = Frequently asked questions =
+  *
-  * [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)?]]
+  [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)?]]
+ 
+  *
-  * [[#ports|What ports does Cassandra use?]]
+  [[#ports|What ports does Cassandra use?]]
+ 
+  *
-  * [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doing a lot of inserts?]]
+  [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doing a lot of inserts?]]
+ 
+  *
-  * [[#existing_data_when_adding_new_nodes|What happens to existing data in my cluster when
I add new nodes?]]
+  [[#existing_data_when_adding_new_nodes|What happens to existing data in my cluster when
I add new nodes?]]
+ 
+  *
-  * [[#modify_cf_config|Can I add/remove/rename Column Families on a working cluster?]]
+  [[#modify_cf_config|Can I add/remove/rename Column Families on a working cluster?]]
+ 
+  *
-  * [[#node_clients_connect_to|Does it matter which node a Thrift client connects to?]]
+  [[#node_clients_connect_to|Does it matter which node a Thrift client connects to?]]
+ 
+  *
-  * [[#what_kind_of_hardware_should_i_use|What kind of hardware should I run Cassandra on?]]
+  [[#what_kind_of_hardware_should_i_use|What kind of hardware should I run Cassandra on?]]
+ 
+  *
-  * [[#architecture|What are SSTables and Memtables?]]
+  [[#architecture|What are SSTables and Memtables?]]
+ 
+  *
-  * [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUIDType in Java?]]
+  [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUIDType in Java?]]
+ 
+  *
-  * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays the same.
What gives?]]
+  [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays the same. What
gives?]]
+ 
+  *
-  * [[#reads_slower_writes|Why are reads slower than writes?]]
+  [[#reads_slower_writes|Why are reads slower than writes?]]
+ 
+  *
-  * [[#cloned|Why does nodeprobe ring only show one entry, even though my nodes logged that
they see each other joining the ring?]]
+  [[#cloned|Why does nodeprobe ring only show one entry, even though my nodes logged that
they see each other joining the ring?]]
+ 
+  *
-  * [[#range_ghosts|Why do deleted keys show up during range scans?]]
+  [[#range_ghosts|Why do deleted keys show up during range scans?]]
+ 
+  *
-  * [[#change_replication|Can I change the ReplicationFactor on a live cluster?]]
+  [[#change_replication|Can I change the ReplicationFactor on a live cluster?]]
+ 
+  *
-  * [[#large_file_and_blob_storage|Can I store large files or BLOBs in Cassandra?]]
+  [[#large_file_and_blob_storage|Can I store large files or BLOBs in Cassandra?]]
+ 
+  *
-  * [[#jmx_localhost_refused|Nodetool says "Connection refused to host: 127.0.1.1", for any
remote host. What gives?]]
+  [[#jmx_localhost_refused|Nodetool says "Connection refused to host: 127.0.1.1", for any
remote host. What gives?]]
+ 
+  *
-  * [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]]
+  [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]]
+ 
+  *
-  * [[#no_keyspaces|Why were none of the keyspaces described in storage-conf.xml loaded?]]
+  [[#no_keyspaces|Why were none of the keyspaces described in storage-conf.xml loaded?]]
+ 
+  *
-  * [[#gui|Is there a GUI admin tool for Cassandra?]]
+  [[#gui|Is there a GUI admin tool for Cassandra?]]
+ 
+  *
-  * [[#a_long_is_exactly_8_bytes|Insert operation throws InvalidRequestException with message
"A long is exactly 8 bytes"]]
+  [[#a_long_is_exactly_8_bytes|Insert operation throws InvalidRequestException with message
"A long is exactly 8 bytes"]]
+ 
+  *
-  * [[#clustername_mismatch|Cassandra says "ClusterName mismatch: oldClusterName != newClusterName"
and refuses to start]]
+  [[#clustername_mismatch|Cassandra says "ClusterName mismatch: oldClusterName != newClusterName"
and refuses to start]]
+ 
+  *
-  * [[#batch_mutate_atomic|Are batch_mutate operations atomic?]]
+  [[#batch_mutate_atomic|Are batch_mutate operations atomic?]]
+ 
  
  <<Anchor(cant_listen_on_ip_any)>>
  
@@ -80, +124 @@

  
   1. You can maintain a list of contact nodes (all or a subset of the nodes in the cluster),
and configure your clients to choose among them.
   1. Use round-robin DNS and create a record that points to a set of contact nodes (recommended).
+  1.
-  1. Use the `get_string_property("token map")` RPC to obtain an update-to-date list of the
nodes in the cluster and cycle through them.
+  Use the `get_string_property("token map")` RPC to obtain an update-to-date list of the
nodes in the cluster and cycle through them.
+ 
   1. Deploy a load-balancer, proxy, etc.
  
  <<Anchor(what_kind_of_hardware_should_i_use)>>
@@ -203, +249 @@

  == Can I change the ReplicationFactor on a live cluster? ==
  Yes, but it will require restarting and running repair manually to change the replica count
of existing data.
  
+  *
-  * Alter the ReplicationFactor for the desired keyspace(s) in the storage configuration
on each node in the cluster.
+  Alter the ReplicationFactor for the desired keyspace(s) in the storage configuration on
each node in the cluster.
+ 
   * Restart cassandra on each node in the cluster
  
  If you're reducing the ReplicationFactor:
@@ -221, +269 @@

  
   * The main limitation on a column and super column size is that all the data for a single
key and column must fit (on disk) on a single machine(node) in the cluster.  Because keys
alone are used to determine the nodes responsible for replicating their data, the amount of
data associated with a single key has this upper bound. This is an inherent limitation of
the distribution model.
  
+  *
-  * When large columns are created and retrieved, that columns data is loaded into RAM which
 can get resource intensive quickly.  Consider, loading  200 rows with columns  that store
10Mb image files each into RAM.  That small result set would consume about 2Gb of RAM.  Clearly
as more and more large columns are loaded,  RAM would start to get consumed quickly.  This
can be worked around, but will take some upfront planning and testing to get a workable solution
for most applications.  You can find more information regarding this behavior here: [[MemtableThresholds|memtables]],
and a possible solution in 0.7 here: [[https://issues.apache.org/jira/browse/CASSANDRA-16|CASSANDRA-16]].
+  When large columns are created and retrieved, that columns data is loaded into RAM which
 can get resource intensive quickly.  Consider, loading  200 rows with columns  that store
10Mb image files each into RAM.  That small result set would consume about 2Gb of RAM.  Clearly
as more and more large columns are loaded,  RAM would start to get consumed quickly.  This
can be worked around, but will take some upfront planning and testing to get a workable solution
for most applications.  You can find more information regarding this behavior here: [[MemtableThresholds|memtables]],
and a possible solution in 0.7 here: [[https://issues.apache.org/jira/browse/CASSANDRA-16|CASSANDRA-16]].
  
+ 
+  *
-  * Please refer to the notes in the Cassandra limitations section for more information:
[[CassandraLimitations|Cassandra Limitations]]
+  Please refer to the notes in the Cassandra limitations section for more information: [[CassandraLimitations|Cassandra
Limitations]]
+ 
  
  <<Anchor(jmx_localhost_refused)>>
  
@@ -289, +341 @@

  <<Anchor(batch_mutate_atomic)>>
  
  == Are batch_mutate operations atomic? ==
- No.  [[API#batch_mutate|batch_mutate]] is a way to group many operations into a single call
in order to save on the cost of network round-trips.  If `batch_mutate` fails in the middle
of its list of mutations, no rollback occurs and the mutations that have already been applied
stay applied. The client should typically retry the mutation.
+ As a special case, mutations against a single key are atomic, but more generally no.  [[API#batch_mutate|batch_mutate]]
allows grouping operations on many keys into a single call in order to save on the cost of
network round-trips.  If `batch_mutate` fails in the middle of its list of mutations, no rollback
occurs and the mutations that have already been applied stay applied. The client should typically
retry the `batch_mutate` operation.
  

Mime
View raw message