cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stuart.h...@rackspace.com>
Subject RE: How does Cassandra store data physically?
Date Wed, 01 Jul 2009 21:36:04 GMT
There is no such thing as a column or supercolumn that is not contained in a ColumnFamily.
The ColumnFamily is the structure that is stored together on disk.

A supercolumn is not what you think it is: supercolumns are like regular columns, except they
contain other columns, and you can have an almost infinite number of supercolumns within a
SuperColumnFamily.

A ColumnFamily is layed out on disk as a sequence of values which is sorted by key, then by
(super)column name (or column timestamp), then subcolumn name/timestamp. Therefore, it is
very fast to get contiguous keys from the ColumnFamily, but to get a single column name from
multiple keys Cassandra still needs to seek to the next interesting column on disk.

There is no concept of 'blocks' in the Cassandra representation, because it does not use a
B-Tree to store data. There is an index for each ColumnFamily on disk that allows Cassandra
to seek directly to a key in the sorted file.

Please see http://wiki.apache.org/cassandra/DataModel

Thanks,
Stu

-----Original Message-----
From: "Ivan Chang" <ivan.chang@medigy.com>
Sent: Wednesday, July 1, 2009 3:00pm
To: cassandra-user@incubator.apache.org
Subject: How does Cassandra store data physically?

I am wondering how Cassandra stores its columns, super columns in the
database files?

A supercolumn logically groups a set of related columns together, when the
supercolumn is written to file, are the columns also stored in adjacent
blocks to each other so IO cost is minimized for related data?  What about
individual columns not associated with any supercolumn, but related only
through a given key?

Thanks,
Ivan



Mime
View raw message