cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "FAQ" by EricEvans
Date Sat, 17 Oct 2009 18:18:54 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by EricEvans.
The comment on this change is: better describe disk needs; remove raid recommendations.


  == What kind of hardware should I run Cassandra on? ==
+ === Memory ===
+ The most recently written data resides in memory tables (aka [[MemtableThresholds|memtables]]),
but older data that has been flushed to disk can be kept in the OS's file-system cache. In
other words, ''the more memory, the better'', (with 1GB being the absolute minimum).
-  * Memory
-    * The more memory, the better: recently written data will be held in [[MemtableThresholds|memtables]]
and older data will remain on disk, but will be cached by the OS's filesystem cache.
-  * CPU
-    * FIXME
-  * Disk
-    * For optimal performance, at least 2 disks are required to run Cassandra. One disk should
be dedicated for use by the commit log (defined by the `CommitLogDirectory` config parameter),
and the remainder should be listed as places to store data files (the `DataFileDirectories`
config parameter).
-    * If you use raid, it is recommended to use raid1 pairs, and to list each pair separately
as `DataFileDirectories`, with one pair for the `CommitLogDirectory`.
+ === CPU ===
+ === Disk ===
+ The short answer here is, ''at least 2 disks'', one to keep your `CommitLogDirectory` on,
the other to use in `DataFileDirectories`. The exact answer though depends a lot on your usage
so it's important to understand what is going on here.
+ Cassandra persists data to disk for two very different purposes. The first, when a new write
is made so that it can be replayed after a crash or system shutdown. The second when thresholds
are exceeded and memtables are flushed to disk as SSTables.
+ Commit logs receive every write made to a Cassandra node and have the potential to block
client operations, but they are only ever read on node start-up. SSTables writes on the other
hand occur asynchronously, but are read to satisfy client look-ups. SSTables are also periodically
merged and rewritten in a process called ''compaction''. Another important distinction is
that commit logs are purged after the corresponding data has been flushed to disk as an SSTable,
so `CommitLogDirectory` only holds uncommitted data while the directories in `DataFileDirectories`
store all of the data written to a node.
+ So to summarize, use a different device for your `CommitLogDirectory`; it needn't be large,
but it should be fast enough to receive all of your writes. Then, use one or more devices
for `DataFileDirectories` and make sure they are both large enough to house all of your data,
and fast enough to satisfy your reads and to keep up with flushing and compaction.

View raw message