Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 90640 invoked from network); 14 Jun 2010 15:12:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Jun 2010 15:12:46 -0000 Received: (qmail 25895 invoked by uid 500); 14 Jun 2010 13:23:25 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 25773 invoked by uid 500); 14 Jun 2010 13:23:23 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 25745 invoked by uid 500); 14 Jun 2010 13:23:23 -0000 Delivered-To: apmail-incubator-cassandra-commits@incubator.apache.org Received: (qmail 25742 invoked by uid 99); 14 Jun 2010 13:23:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jun 2010 13:23:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jun 2010 13:23:19 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 4A0BC1761F; Mon, 14 Jun 2010 13:22:58 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Mon, 14 Jun 2010 13:22:58 -0000 Message-ID: <20100614132258.16921.10570@eos.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22FAQ=22_by_JonathanEllis?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "FAQ" page has been changed by JonathanEllis. The comment on this change is: mutations against a single key are atomic. http://wiki.apache.org/cassandra/FAQ?action=3Ddiff&rev1=3D73&rev2=3D74 -------------------------------------------------- =3D Frequently asked questions =3D + * - * [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 = (all my addresses)?]] + [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 (a= ll my addresses)?]] + = + * - * [[#ports|What ports does Cassandra use?]] + [[#ports|What ports does Cassandra use?]] + = + * - * [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after d= oing a lot of inserts?]] + [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doi= ng a lot of inserts?]] + = + * - * [[#existing_data_when_adding_new_nodes|What happens to existing data i= n my cluster when I add new nodes?]] + [[#existing_data_when_adding_new_nodes|What happens to existing data in = my cluster when I add new nodes?]] + = + * - * [[#modify_cf_config|Can I add/remove/rename Column Families on a worki= ng cluster?]] + [[#modify_cf_config|Can I add/remove/rename Column Families on a working= cluster?]] + = + * - * [[#node_clients_connect_to|Does it matter which node a Thrift client c= onnects to?]] + [[#node_clients_connect_to|Does it matter which node a Thrift client con= nects to?]] + = + * - * [[#what_kind_of_hardware_should_i_use|What kind of hardware should I r= un Cassandra on?]] + [[#what_kind_of_hardware_should_i_use|What kind of hardware should I run= Cassandra on?]] + = + * - * [[#architecture|What are SSTables and Memtables?]] + [[#architecture|What are SSTables and Memtables?]] + = + * - * [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUU= IDType in Java?]] + [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUID= Type in Java?]] + = + * - * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage s= tays the same. What gives?]] + [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage sta= ys the same. What gives?]] + = + * - * [[#reads_slower_writes|Why are reads slower than writes?]] + [[#reads_slower_writes|Why are reads slower than writes?]] + = + * - * [[#cloned|Why does nodeprobe ring only show one entry, even though my = nodes logged that they see each other joining the ring?]] + [[#cloned|Why does nodeprobe ring only show one entry, even though my no= des logged that they see each other joining the ring?]] + = + * - * [[#range_ghosts|Why do deleted keys show up during range scans?]] + [[#range_ghosts|Why do deleted keys show up during range scans?]] + = + * - * [[#change_replication|Can I change the ReplicationFactor on a live clu= ster?]] + [[#change_replication|Can I change the ReplicationFactor on a live clust= er?]] + = + * - * [[#large_file_and_blob_storage|Can I store large files or BLOBs in Cas= sandra?]] + [[#large_file_and_blob_storage|Can I store large files or BLOBs in Cassa= ndra?]] + = + * - * [[#jmx_localhost_refused|Nodetool says "Connection refused to host: 12= 7.0.1.1", for any remote host. What gives?]] + [[#jmx_localhost_refused|Nodetool says "Connection refused to host: 127.= 0.1.1", for any remote host. What gives?]] + = + * - * [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]] + [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]] + = + * - * [[#no_keyspaces|Why were none of the keyspaces described in storage-co= nf.xml loaded?]] + [[#no_keyspaces|Why were none of the keyspaces described in storage-conf= .xml loaded?]] + = + * - * [[#gui|Is there a GUI admin tool for Cassandra?]] + [[#gui|Is there a GUI admin tool for Cassandra?]] + = + * - * [[#a_long_is_exactly_8_bytes|Insert operation throws InvalidRequestExc= eption with message "A long is exactly 8 bytes"]] + [[#a_long_is_exactly_8_bytes|Insert operation throws InvalidRequestExcep= tion with message "A long is exactly 8 bytes"]] + = + * - * [[#clustername_mismatch|Cassandra says "ClusterName mismatch: oldClust= erName !=3D newClusterName" and refuses to start]] + [[#clustername_mismatch|Cassandra says "ClusterName mismatch: oldCluster= Name !=3D newClusterName" and refuses to start]] + = + * - * [[#batch_mutate_atomic|Are batch_mutate operations atomic?]] + [[#batch_mutate_atomic|Are batch_mutate operations atomic?]] + = = <> = @@ -80, +124 @@ = 1. You can maintain a list of contact nodes (all or a subset of the node= s in the cluster), and configure your clients to choose among them. 1. Use round-robin DNS and create a record that points to a set of conta= ct nodes (recommended). + 1. - 1. Use the `get_string_property("token map")` RPC to obtain an update-to= -date list of the nodes in the cluster and cycle through them. + Use the `get_string_property("token map")` RPC to obtain an update-to-da= te list of the nodes in the cluster and cycle through them. + = 1. Deploy a load-balancer, proxy, etc. = <> @@ -203, +249 @@ =3D=3D Can I change the ReplicationFactor on a live cluster? =3D=3D Yes, but it will require restarting and running repair manually to change= the replica count of existing data. = + * - * Alter the ReplicationFactor for the desired keyspace(s) in the storage= configuration on each node in the cluster. + Alter the ReplicationFactor for the desired keyspace(s) in the storage c= onfiguration on each node in the cluster. + = * Restart cassandra on each node in the cluster = If you're reducing the ReplicationFactor: @@ -221, +269 @@ = * The main limitation on a column and super column size is that all the = data for a single key and column must fit (on disk) on a single machine(nod= e) in the cluster. Because keys alone are used to determine the nodes resp= onsible for replicating their data, the amount of data associated with a si= ngle key has this upper bound. This is an inherent limitation of the distri= bution model. = + * - * When large columns are created and retrieved, that columns data is loa= ded into RAM which can get resource intensive quickly. Consider, loading = 200 rows with columns that store 10Mb image files each into RAM. That sm= all result set would consume about 2Gb of RAM. Clearly as more and more la= rge columns are loaded, RAM would start to get consumed quickly. This can= be worked around, but will take some upfront planning and testing to get a= workable solution for most applications. You can find more information re= garding this behavior here: [[MemtableThresholds|memtables]], and a possibl= e solution in 0.7 here: [[https://issues.apache.org/jira/browse/CASSANDRA-1= 6|CASSANDRA-16]]. + When large columns are created and retrieved, that columns data is loade= d into RAM which can get resource intensive quickly. Consider, loading 2= 00 rows with columns that store 10Mb image files each into RAM. That smal= l result set would consume about 2Gb of RAM. Clearly as more and more larg= e columns are loaded, RAM would start to get consumed quickly. This can b= e worked around, but will take some upfront planning and testing to get a w= orkable solution for most applications. You can find more information rega= rding this behavior here: [[MemtableThresholds|memtables]], and a possible = solution in 0.7 here: [[https://issues.apache.org/jira/browse/CASSANDRA-16|= CASSANDRA-16]]. = + = + * - * Please refer to the notes in the Cassandra limitations section for mor= e information: [[CassandraLimitations|Cassandra Limitations]] + Please refer to the notes in the Cassandra limitations section for more = information: [[CassandraLimitations|Cassandra Limitations]] + = = <> = @@ -289, +341 @@ <> = =3D=3D Are batch_mutate operations atomic? =3D=3D - No. [[API#batch_mutate|batch_mutate]] is a way to group many operations = into a single call in order to save on the cost of network round-trips. If= `batch_mutate` fails in the middle of its list of mutations, no rollback o= ccurs and the mutations that have already been applied stay applied. The cl= ient should typically retry the mutation. + As a special case, mutations against a single key are atomic, but more ge= nerally no. [[API#batch_mutate|batch_mutate]] allows grouping operations o= n many keys into a single call in order to save on the cost of network roun= d-trips. If `batch_mutate` fails in the middle of its list of mutations, n= o rollback occurs and the mutations that have already been applied stay app= lied. The client should typically retry the `batch_mutate` operation. =20