cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "CassandraLimitations" by JonathanEllis
Date Mon, 31 Aug 2009 20:58:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The following page has been changed by JonathanEllis:

  = Limitations =
- From easiest to fix to hardest:
+ == Inherent in the design ==
+ The main limitation on column and supercolumn size is that all data for a single key and
column must fit (on disk) on a single machine in the cluster. Because keys alone are used
to determine the nodes responsible for replicating their data, the amount of data associated
with a single key has this upper bound. This is an inherent limitation of the distribution
+ == Artifacts of the current code base ==
   * Cassandra's compaction code currently deserializes an entire row (per columnfamily) at
a time.  So all the data from a given columnfamily/key pair must fit in memory.  Fixing this
is relatively easy since columns are stored in-order on disk so there is really no reason
you have to deserialize row-at-a-time except that that is easier with the current encapsulation
of functionality.
   * Cassandra has two levels of indexes: key and column.  But in super columnfamilies there
is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes
_all_ the subcolumns in that supercolumn.  So you want to avoid a data model that requires
large numbers of subcolumns.  This can be fixed; the core classes involved are SuperColumn
and SequenceFile.

View raw message