cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "MemtableThresholds" by FlipKromer
Date Tue, 31 Aug 2010 03:37:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "MemtableThresholds" page has been changed by FlipKromer.
http://wiki.apache.org/cassandra/MemtableThresholds?action=diff&rev1=14&rev2=15

--------------------------------------------------

  Linux users should understand fully and then consider adjusting the system values for swappiness,
overcommit_memory and overcommit_ratio.
  
  == Memtable Thresholds ==
- When performing write operations, Cassandra stores values to column-family specific, in-memory
data structures called Memtables. These Memtables are flushed to disk whenever one of the
configurable thresholds is exceeded. Proper tuning of these thresholds is important in making
the most of  available system memory, without bringing the node down for lack of memory.
+ When performing write operations, Cassandra stores values to column-family specific, in-memory
data structures called Memtables. These Memtables are flushed to disk whenever one of the
configurable thresholds is exceeded. The initial settings  (64mb/0.3) are purposefully conservative,
and proper tuning of these thresholds is important in making the most of  available system
memory, without bringing the node down for lack of memory.
  
  == Configuring Thresholds ==
- Since Memtables are storing actual column values, they consume at least as much memory as
the size of data inserted. However, there is also overhead  associated with the structures
used to index this data. When the number of columns and rows is high compared to the size
of values, this overhead can become quite significant, (possibly greater than the data itself).
+ '''Larger ''''''Memtables take memory away from caches:''' Since Memtables are storing actual
column values, they consume at least as much memory as the size of data inserted. However,
there is also overhead  associated with the structures used to index this data. When the number
of columns and rows is high compared to the size of values, this overhead can become quite
significant, (possibly greater than the data itself).  In other words, which threshold(s)
to use, and what to set them to is not just a function of how much memory you have, but of
how many column families, how many columns per column-family, and the size of values  being
stored.
  
- In other words, which threshold(s) to use, and what to set them to is not just a function
of how much memory you have, but of how many column families, how many columns per column-family,
and the size of values  being stored.
+ '''Larger Memtables don't improve write performance: '''Increasing the memtable capacity
will cause less-frequent flushes but doesn't improve write performance directly: writes go
directly to memory regardless. (Actually, if your commitlog and sstables share a volume they
might contend, so if at all possible, put them on separate volumes)
+ 
+ '''Larger memtables do absorb more overwrites''': If your write load sees some rows written
more often than others (eg upvotes of a front-page story) a larger memtable absorbs more overwrites,
which creates more efficient sstables and thus better read performance.  If your write load
is batch oriented or if you have a massive row set, rows are not likely to be rewritten for
a long time, and so this benefit will pay a smaller dividend.
+ 
+ '''Larger memtables lead to more effective compaction''': Since compaction is tiered, large
sstables are prefereable: turning over tons of tiny memtables is bad. Again, this impacts
read performance (by improving the overall io-contention weather), but not writes.
  
  Listed below are the thresholds found in `storage-conf.xml`, along with a description.
  
@@ -35, +39 @@

  === MemtableObjectCountInMillions ===
  This directive sets a threshold on the number of columns stored.
  
- Left unconfigured (missing from the config), this defaults to 1  (or 1,000,000 objects).
+ Left unconfigured (missing from the config), this defaults to 0.1  (or 100,000 objects),
but the config file's inital setting of 0.3 (or 300,000 objects) is reasonable.
  
  ''Note: The value is applied on a per column-family basis.''
  

Mime
View raw message