cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "CassandraLimitations" by JonathanEllis
Date Thu, 15 Jul 2010 04:03:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "CassandraLimitations" page has been changed by JonathanEllis.
The comment on this change is: update compaction and thrift-oom for 0.7.
http://wiki.apache.org/cassandra/CassandraLimitations?action=diff&rev1=11&rev2=12

--------------------------------------------------

  = Limitations =
  == Inherent in the design ==
- The main limitation on column and supercolumn size is that all data for a single key and
column must fit (on disk) on a single machine in the cluster. Because keys alone are used
to determine the nodes responsible for replicating their data, the amount of data associated
with a single key has this upper bound. This is an inherent limitation of the distribution
model.
+ 
+ == Stuff that isn't likely to change ==
+  * All data for a single row must fit (on disk) on a single machine in the cluster. Because
row keys alone are used to determine the nodes responsible for replicating their data, the
amount of data associated with a single key has this upper bound.
+  * A single column value may not be larger than 2GB.
  
  == Artifacts of the current code base ==
-  * The byte[] size of a value can't be more than 2^31-1.
-  * Cassandra's compaction code currently deserializes an entire row (per columnfamily) at
a time.  So all the data from a given columnfamily/key pair must fit in memory.  Fixing this
is relatively easy since columns are stored in-order on disk so there is really no reason
you have to deserialize row-at-a-time except that that is easier with the current encapsulation
of functionality.  This will be fixed in https://issues.apache.org/jira/browse/CASSANDRA-16
-    * A related limitation is that an entire row cannot be larger than 2^31-1 bytes, since
the length of rows is serialized to disk using an integer.
   * Cassandra has two levels of indexes: key and column.  But in super columnfamilies there
is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes
_all_ the subcolumns in that supercolumn.  So you want to avoid a data model that requires
large numbers of subcolumns.  https://issues.apache.org/jira/browse/CASSANDRA-598 is open
to remove this limitation.
   * <<Anchor(streaming)>>Cassandra's public API is based on Thrift, which offers
no streaming abilities -- any value written or fetched has to fit in memory.  This is inherent
to Thrift's design and is therefore unlikely to change.  So adding large object support to
Cassandra would need a special API that manually split the large objects up into pieces. A
potential approach is described in http://issues.apache.org/jira/browse/CASSANDRA-265.  As
a workaround in the meantime, you can manually split files into chunks of whatever size you
are comfortable with -- at least one person is using 64MB -- and making a file correspond
to a row, with the chunks as column values.
-  * Thrift will crash Cassandra if you send random or malicious data to it.  This makes exposing
the Cassandra port directly to the outside internet a Bad Idea.  See http://issues.apache.org/jira/browse/CASSANDRA-475
and http://issues.apache.org/jira/browse/THRIFT-601 for details.
  
  == Obsolete Limitations ==
+  * Prior to version 0.7, Cassandra's compaction code deserialized an entire row (per columnfamily)
at a time.  So all the data from a given columnfamily/key pair had to fit in memory, or 2GB,
whichever was smaller (since the length of the row was serialized as a Java int).
+  * Prior to version 0.7, Thrift would crash Cassandra if you send random or malicious data
to it.  This made exposing the Cassandra port directly to the outside internet a Bad Idea.
   * Prior to version 0.4, Cassandra did not fsync the commitlog before acking a write.  Most
of the time this is Good Enough when you are writing to multiple replicas since the odds are
slim of all replicas dying before the data actually hits the disk, but the truly paranoid
will want real fsync-before-ack.  This is now an option.
  

Mime
View raw message