cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FAQ" by EricEvans
Date Sat, 17 Oct 2009 18:18:54 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by EricEvans.
The comment on this change is: better describe disk needs; remove raid recommendations.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=21&rev2=22

--------------------------------------------------

  <<Anchor(what_kind_of_hardware_should_i_use)>>
  == What kind of hardware should I run Cassandra on? ==
  
+ === Memory ===
+ The most recently written data resides in memory tables (aka [[MemtableThresholds|memtables]]),
but older data that has been flushed to disk can be kept in the OS's file-system cache. In
other words, ''the more memory, the better'', (with 1GB being the absolute minimum).
-  * Memory
-    * The more memory, the better: recently written data will be held in [[MemtableThresholds|memtables]]
and older data will remain on disk, but will be cached by the OS's filesystem cache.
-  * CPU
-    * FIXME
-  * Disk
-    * For optimal performance, at least 2 disks are required to run Cassandra. One disk should
be dedicated for use by the commit log (defined by the `CommitLogDirectory` config parameter),
and the remainder should be listed as places to store data files (the `DataFileDirectories`
config parameter).
-    * If you use raid, it is recommended to use raid1 pairs, and to list each pair separately
as `DataFileDirectories`, with one pair for the `CommitLogDirectory`.
  
+ === CPU ===
+ FIXME
+ 
+ === Disk ===
+ The short answer here is, ''at least 2 disks'', one to keep your `CommitLogDirectory` on,
the other to use in `DataFileDirectories`. The exact answer though depends a lot on your usage
so it's important to understand what is going on here.
+ 
+ Cassandra persists data to disk for two very different purposes. The first, when a new write
is made so that it can be replayed after a crash or system shutdown. The second when thresholds
are exceeded and memtables are flushed to disk as SSTables.
+ 
+ Commit logs receive every write made to a Cassandra node and have the potential to block
client operations, but they are only ever read on node start-up. SSTables writes on the other
hand occur asynchronously, but are read to satisfy client look-ups. SSTables are also periodically
merged and rewritten in a process called ''compaction''. Another important distinction is
that commit logs are purged after the corresponding data has been flushed to disk as an SSTable,
so `CommitLogDirectory` only holds uncommitted data while the directories in `DataFileDirectories`
store all of the data written to a node.
+ 
+ So to summarize, use a different device for your `CommitLogDirectory`; it needn't be large,
but it should be fast enough to receive all of your writes. Then, use one or more devices
for `DataFileDirectories` and make sure they are both large enough to house all of your data,
and fast enough to satisfy your reads and to keep up with flushing and compaction.
+ 

Mime
View raw message