cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "DataModelv2" by StaffanEricsson
Date Tue, 09 Mar 2010 15:34:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "DataModelv2" page has been changed by StaffanEricsson.
http://wiki.apache.org/cassandra/DataModelv2?action=diff&rev1=9&rev2=10

--------------------------------------------------

- ## page was copied from DataModel
  = Introduction =
  
  Cassandra has a data model that can most easily be thought of as a four or five dimensional
hash.
  
  The basic concepts are:
-  * Cluster: the machines (nodes) in a logical Cassandra instance.  Clusters can contain
multiple keyspaces.
+  * Cluster- a number of nodes (servers) in a logical Cassandra instance.  Clusters can contain
multiple keyspaces.
   * Keyspace: a namespace for !ColumnFamilies, typically one per application.
   * !ColumnFamilies contain multiple columns, each of which has a name, value, and a timestamp,
and which are referenced by row keys.
   * !SuperColumns can be thought of as columns that themselves have subcolumns.
@@ -34, +33 @@

  }
  }}}
  
- All values are supplied by the client, including the 'timestamp'.  This means that clocks
on the clients should be synchronized (in the Cassandra server environment is useful also),
as these timestamps are used for conflict resolution.  In many cases the 'timestamp' is not
used in client applications, and it becomes convenient to think of a column as a name/value
pair. For the remainder of this document, 'timestamps' will be elided for readability.  It
is also worth noting the name and value are binary values, although in many applications they
are UTF8 serialized strings.
+ All values are supplied by the client, including the 'timestamp'.  This means that clocks
on the clients should be synchronized (in the Cassandra server environment is useful also),
as these timestamps are used for conflict resolution.  In many cases the 'timestamp' is not
used in client applications, and it becomes convenient to think of a column as a name/value
pair. For the remainder of this document, 'timestamps' will be mostly elided for readability.
 It is also worth noting the name and value are binary values, although in many applications
they are UTF8 serialized strings.
  
- Timestamps can be anything you like, but milliseconds since 1970 is a convention, as returned
by System.getTimeMillis() in Java. Whatever you use, it must be consistent across the application
otherwise earlier changes may overwrite newer ones.
+ Timestamps can be any number you like, but milliseconds since 1970 is a convention, as returned
by System.getTimeMillis() in Java. Whatever you use, it must be consistent across the application
otherwise earlier changes may overwrite newer ones.
  
  = Column Families =
  
- A column family is a container for columns, analogous to the table in a relational system.
 You define column families in your storage-conf.xml file, and cannot modify them (or add
new column families) without restarting your Cassandra process.  A column family holds an
ordered list of columns, which you can reference by the column name.
+ A column family is a container for columns.  You define column families in your storage-conf.xml
file, and cannot modify them (or add new column families) without restarting your Cassandra
process.  A column family holds an ordered list of columns, which you can reference by the
column name.
  
  Column families have a configurable ordering applied to the columns within each row, which
affects the behavior of the get_slice call in the thrift API.  Out of the box ordering implementations
include ASCII, UTF-8, Long, and UUID (lexical or time).
  
@@ -53, +52 @@

  A JSON representation of the key -> column families -> column structure is
  {{{
  {
-    "mccv":{
+    "mccv":{  //Key
-       "Users":{
+       "Users":{ //Column Familiy
-          "emailAddress":{"name":"emailAddress", "value":"foo@bar.com"},
+          "emailAddress":{"name":"emailAddress", "value":"foo@bar.com", "timestamp":"1234567890"},
//Column
-          "webSite":{"name":"webSite", "value":"http://bar.com"}
+          "webSite":{"name":"webSite", "value":"http://bar.com", "timestamp":"1234567890"}
 //Column
        },
-       "Stats":{
-          "visits":{"name":"visits", "value":"243"}
+       "Stats":{ //Column Family
+          "visits":{"name":"visits", "value":"243", "timestamp":"1234567890"}  //Column
        }
     },
-    "user2":{
-       "Users":{
+    "matt":{ //Key
+       "Users":{ //Column Familiy
-          "emailAddress":{"name":"emailAddress", "value":"user2@bar.com"},
+          "emailAddress":{"name":"emailAddress", "value":"user2@bar.com", "timestamp":"1234567890"},
 //Column
-          "twitter":{"name":"twitter", "value":"user2"}
+          "twitter":{"name":"twitter", "value":"user2", "timestamp":"1234567890"} //Column
        }
     }
  }
  }}}
  
- Note that the key "mccv" identifies data in two different column families, "Users" and "Stats".
This does not imply that data from these column families is related.  The semantics of having
data for the same key in two different column families is entirely up to the application.
 Also note that within the "Users" column family, "mccv" and "user2" have different column
names defined.  This is perfectly valid in Cassandra.  In fact there may be a virtually unlimited
set of column names defined, which leads to fairly common use of the column name as a piece
of runtime populated data.  This is unusual in storage systems, particularly if you're coming
from the RDBMS world.
+ Note that the key "mccv" identifies data in two different column families, "Users" and "Stats".
This does not imply that data from these column families is related.  The semantics of having
data for the same key in two different column families is entirely up to the application.
 Also note that within the "Users" column family, "mccv" and "matt" have different column
names defined.  This is perfectly valid in Cassandra.  In fact there may be a virtually unlimited
set of column names defined, which leads to fairly common use of the column name as a piece
of runtime populated data.  This is unusual in storage systems, particularly if you're coming
from the RDBMS world.
  
  = Keyspaces =
  
@@ -86, +85 @@

  A JSON description of this layout:
  {{{
  {
-   "mccv": {
+   "mccv": { //Key
-     "Tags": {
-       "cassandra": {
+     "Tags": { //column family 
+       "cassandra": { //SuperColumn
-         "incubator": {"incubator": "http://incubator.apache.org/cassandra/"},
+         "incubator": {"incubator": "http://incubator.apache.org/cassandra/", "timestamp":"1234567890"},
//Column
-         "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA"}
+         "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA", "timestamp":"1234567890"}
//Column
        },
-       "thrift": {
+       "thrift": { //SuperColumn
-         "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT"}
+         "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT", "timestamp":"1234567890"}
//Column
        }
      }  
    }

Mime
View raw message