cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Trivial Update of "LargeDataSetConsiderations" by jeremyhanna
Date Wed, 04 Sep 2013 15:00:46 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "LargeDataSetConsiderations" page has been changed by jeremyhanna:
https://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diff&rev1=25&rev2=26

   * Consider the choice of file system. Removal of large files is notoriously slow and seek
bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables
that happens every now and then, and also affects start-up time (if there are sstables pending
removal when a node is starting up, they are removed as part of the start-up procees; it may
thus be detrimental if removing a terabyte of sstables takes an hour (numbers are ballparks,
not accurately measured and depends on circumstances)).
   * Adding nodes is a slow process if each node is responsible for a large amount of data.
Plan for this; do not try to throw additional hardware at a cluster at the last minute.
   * The operating system's page cache is affected by compaction and repair operations. If
you are relying on the page cache to keep the active set in memory, you may see significant
degradation on performance as a result of compaction and repair operations.  See the cassandra.yaml
for settings to reduce this impact.
-  * The partition (or sampled) index entries for each sstable can start to add up.  You can
reduce the memory usage by tuning the interval that it samples at.  The setting is index_interval
the cassandra.yaml.  See the comments there for more information.
+  * The partition (or sampled) index entries for each sstable can start to add up.  You can
reduce the memory usage by tuning the interval that it samples at.  The setting is index_interval
in cassandra.yaml.  See the comments there for more information.
  
  Other references to improvements:
   * [[http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2|Performance
improvements in Cassandra 1.2]]

Mime
View raw message