cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jbel...@apache.org
Subject svn commit: r782110 - /incubator/cassandra/branches/cassandra-0.3/BUGS.txt
Date Fri, 05 Jun 2009 19:31:39 GMT
Author: jbellis
Date: Fri Jun  5 19:31:38 2009
New Revision: 782110

URL: http://svn.apache.org/viewvc?rev=782110&view=rev
Log:
add BUGS.txt.  patch by jbellis and Eric Evans for CASSANDRA-216

Added:
    incubator/cassandra/branches/cassandra-0.3/BUGS.txt

Added: incubator/cassandra/branches/cassandra-0.3/BUGS.txt
URL: http://svn.apache.org/viewvc/incubator/cassandra/branches/cassandra-0.3/BUGS.txt?rev=782110&view=auto
==============================================================================
--- incubator/cassandra/branches/cassandra-0.3/BUGS.txt (added)
+++ incubator/cassandra/branches/cassandra-0.3/BUGS.txt Fri Jun  5 19:31:38 2009
@@ -0,0 +1,50 @@
+We consider 0.3 most appropriate for someone who wants to evaluate
+Cassandra without dealing with the highly variable degree of stability
+that a nightly build offers.  Here are the known issues you should
+be most concerned about:
+
+ 1. With enough and large enough keys in a ColumnFamily, Cassandra will
+    run out of memory trying to perform compactions (data file merges).
+    The size of what is stored in memory is (S + 16) * (N + M) where S
+    is the size of the key (usually 2 bytes per character), N is the
+    number of keys and M, is the map overhead (which can be guestimated
+    at around 32 bytes per key).
+
+    So, if you have 10-character keys and 1GB of headroom in your heap
+    space for compaction, you can expect to store about 17M keys
+    before running into problems.
+
+    See https://issues.apache.org/jira/browse/CASSANDRA-208
+
+ 2. Because fixing #1 requires a data file format change, 0.4 will not
+    be binary-compatible with 0.3 data files.  A client-side upgrade
+    can be done relatively easily with the following algorithm:
+
+	for key in old_client.get_key_range(everything):
+          columns = old_client.get_slice or get_slice_super(key, all columns)
+	  new_client.batch_insert or batch_insert_super(key, columns)
+
+    The inner loop can be trivially parallelized for speed.
+
+ 3. Commitlog does not fsync before reporting a write successful.
+    Using blocking writes mitigates this to some degree, since all
+    nodes that were part of the write quorum would have to fail
+    before sync for data to be lost.
+
+    See https://issues.apache.org/jira/browse/CASSANDRA-182
+
+
+Additionally, row size (that is, all the data associated with a single
+key in a given ColumnFamily) is limited by available memory, for
+two reasons:
+
+ 1. get_slice offsets are not indexed.  Every time you do a get_slice,
+    Cassandra has to deserialize the entire ColumnFamily row into
+    memory.  (This is already fixed in trunk.)
+
+    See https://issues.apache.org/jira/browse/CASSANDRA-172
+
+ 2. Compaction deserializes each row before merging.
+
+    See https://issues.apache.org/jira/browse/CASSANDRA-16
+    



Mime
View raw message