cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "CassandraLimitations" by JonathanEllis
Date Tue, 28 Jul 2009 19:56:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The following page has been changed by JonathanEllis:
http://wiki.apache.org/cassandra/CassandraLimitations

------------------------------------------------------------------------------
  From easiest to fix to hardest:
  
   * Cassandra's compaction code currently deserializes an entire row (per columnfamily) at
a time.  So all the data from a given columnfamily/key pair must fit in memory.  Fixing this
is relatively easy since columns are stored in-order on disk so there is really no reason
you have to deserialize row-at-a-time except that that is easier with the current encapsulation
of functionality.
-  * Cassandra does not currently fsync the commitlog before acking a write.  Most of the
time this is Good Enough when you are writing to multiple replicas since the odds are slim
of all replicas dying before the data actually hits the disk, but the truly paranoid will
want real fsync-before-ack.  Just adding fsync would be just a few lines (to CommitLog, naturally),
but we want to do this without killing performance, so what we want is an Executor that fsyncs
after writing batches of commitlog entries (and then asynchronously notifies the write threads).
   * Cassandra has two levels of indexes: key and column.  But in super columnfamilies there
is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes
_all_ the subcolumns in that supercolumn.  So you want to avoid a data model that requires
large numbers of subcolumns.  This can be fixed; the core classes involved are SuperColumn
and SequenceFile.
-  * Cassandra's public API is based on Thrift, which offers no streaming abilities -- any
value written or fetched has to fit in memory.  This is inherent to Thrift's design; I don't
see it changing.  So (similar to traditional rdbmses) you're better off storing large blobs
directly in the filesystem with a pointer to machine:path, than storing the blobs directly
in Cassandra.
+  * Cassandra's public API is based on Thrift, which offers no streaming abilities -- any
value written or fetched has to fit in memory.  This is inherent to Thrift's design; I don't
see it changing.  So adding large object support to Cassandra would need a special API that
manually split the large objects up into pieces.  Jonathan Ellis sketched out one approach
in https://issues.apache.org/jira/browse/CASSANDRA-265.
  
+ == Obsolete Limitations ==
+  * Prior to version 0.4, Cassandra did not fsync the commitlog before acking a write.  Most
of the time this is Good Enough when you are writing to multiple replicas since the odds are
slim of all replicas dying before the data actually hits the disk, but the truly paranoid
will want real fsync-before-ack.  This is now an option.
+ 

Mime
View raw message