cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "DataModel" by JonathanEllis
Date Mon, 11 May 2009 21:44:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The following page has been changed by JonathanEllis:
http://wiki.apache.org/cassandra/DataModel

------------------------------------------------------------------------------
  
  Basic unit of access control within Cassandra is a Column Family. A table in Cassandra is
made up of one or many column families. A row in a table is uniquely identified using a unique
key. The is key is a string and can be of any size. The number of column families and the
name of each column family must currently be fixed at the time the cluster is started. There
is no limitation on the number of column families but it is expected that there would be relatively
few of these. A column family can be of one of two type: Simple or Super. Columns within both
of these are dynamically created and there is no limit on the number of these. Columns are
constructs that are uniquely identified by a name, a value and a user-defined time stamp.
The number of columns that can be contained in a column family could be very large. This can
also vary per key. For instance key K1 could have 1024 columns/supercolumns while key K2 could
have 64 columns/supercolumns. Supercolumns are constructs
  that have a name and an infinite number of columns associated with them. The number of supercolumns
associated with any column family may be very large. They exhibit the same characteristics
as columns. The columns can be sorted by name or time and this can be explicitly expressed
via the configuration file, for any given column family.
  
- The main limitation on column and supercolumn size is that all data for a single key must
fit on a single machine in the cluster.  Because keys alone are used to determine the nodes
responsible for replicating their data, the amount of data associated with a single key has
this upper bound.
+ The main limitation on column and supercolumn size is that all data for a single key must
fit (on disk) on a single machine in the cluster.  Because keys alone are used to determine
the nodes responsible for replicating their data, the amount of data associated with a single
key has this upper bound.  This is an inherent limitation of the distribution model.
+ 
+ Currently Cassandra also has the limitation that in the worst case, data for a key / ColumnFamily
pair will all be deserialized into memory for a read request.  (But never for writes!)  This
will be fixed in a future release.
  
  = More Detail =
  

Mime
View raw message