cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FAQ" by JonathanEllis
Date Mon, 08 Feb 2010 19:35:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by JonathanEllis.
The comment on this change is: link MemtableSSTable.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=36&rev2=37

--------------------------------------------------

  
  <<Anchor(reads_slower_writes)>>
  == Why are reads slower than writes? ==
- Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees
and in-place updates on disk.  Instead, it uses a sstable/memtable model like Bigtable's:
writes to each ColumnFamily are grouped together in an in-memory structure before being flushed
(sorted and written to disk).  Thus, writes are extremely fast, costing only a commitlog append
and an amortized sequential write for the flush.  This means that writes cost no random I/O,
compared to a b-tree system which not only has to seek to the data location to overwrite,
but also may have to seek to read different levels of the index if it outgrows disk cache!
 
+ Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees
and in-place updates on disk.  Instead, it uses a sstable/memtable model like Bigtable's:
writes to each ColumnFamily are grouped together in an in-memory structure before being flushed
(sorted and written to disk).  This means that writes cost no random I/O, compared to a b-tree
system which not only has to seek to the data location to overwrite, but also may have to
seek to read different levels of the index if it outgrows disk cache!  
  
- The downside is that on a read, Cassandra has to (potentially) merge row fragments from
multiple sstables on disk.  We think this is a tradeoff worth making, first because scaling
writes has always been harder than scaling reads, and second because as your data corpus grows
Cassandra's read disadvantage narrows vs b-tree systems that have to do multiple seeks against
a large index.
+ The downside is that on a read, Cassandra has to (potentially) merge row fragments from
multiple sstables on disk.  We think this is a tradeoff worth making, first because scaling
writes has always been harder than scaling reads, and second because as your data corpus grows
Cassandra's read disadvantage narrows vs b-tree systems that have to do multiple seeks against
a large index.  See MemtableSSTable for more details.
  

Mime
View raw message