cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "MemtableThresholds" by FlipKromer
Date Tue, 31 Aug 2010 02:26:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "MemtableThresholds" page has been changed by FlipKromer.
http://wiki.apache.org/cassandra/MemtableThresholds?action=diff&rev1=10&rev2=11

--------------------------------------------------

+ == Don't Touch that Dial ==
+ The settings described here should only be changed in the face of a quantifiable performance
problem. They will affect the cluster quite differently for distinct use cases and workloads,
and the defaults were well-chosen.
- When performing write operations, Cassandra stores values to column-family
- specific, in-memory data structures called Memtables. These Memtables are
- flushed to disk whenever one of the configurable thresholds is exceeded.
- Proper tuning of these thresholds is important in making the most of 
- available system memory, without bringing the node down for lack of memory.
  
- (Note that by default bin/cassandra.in.sh specifies a maximum JVM heap of
- -Xmx1G, which is very low for production use.  Consider increasing this
- as well.)
+ == JVM Heap Size ==
+ By default, the cassandra startup scripts specifies a maximum JVM heap of -Xmx1G, which
is very low for production use. Consider increasing this -- but gently! If Cassandra and other
processes take up too much of the available ram, they'll force out the operating system's
file buffers and caches. These are as important as the internal data structures for ensuring
Cassandra performance.
+ 
+ It's much riskier to start tuning with this too high (a difficult-to-pinpoint malaise) than
too low (easy to diagnose using JMX) -- the OS is much smarter than you think. For a high-end
machine with say 48GB of ram, a 12GB heap size is reasonable.  For a rough rule of thumb,
Cassandra's internal datastructures will require about {{{memtable_throughput_in_mb * 3 *
number of hot CFs + 1G + internal caches}}}.
+ 
+ Also know that if you're running up against the heap limit under load that's probably a
symptom of other problems -- diagnose those first.
+ 
+ == Virtual Memory and Swap ==
+ On a dedicated cassandra machine, the best value for your swap settings is no swap at all
-- it's better to have the OS kill the java process (taking the node down but leaving your
monitoring, etc. up) than to have the system go into swap death (and become entirely unreachable).
+ 
+ Linux users should understand fully and then consider adjusting the system values for swappiness,
overcommit_memory and overcommit_ratio.
+ 
+ == Memtable Thresholds ==
+ When performing write operations, Cassandra stores values to column-family specific, in-memory
data structures called Memtables. These Memtables are flushed to disk whenever one of the
configurable thresholds is exceeded. Proper tuning of these thresholds is important in making
the most of  available system memory, without bringing the node down for lack of memory.
+ 
+ (Note that by default bin/cassandra.in.sh specifies a maximum JVM heap of -Xmx1G, which
is very low for production use.  Consider increasing this as well.)
  
  == Configuring Thresholds ==
+ Since Memtables are storing actual column values, they consume at least as much memory as
the size of data inserted. However, there is also overhead  associated with the structures
used to index this data. When the number of columns and rows is high compared to the size
of values, this overhead can become quite significant, (possibly greater than the data itself).
- Since Memtables are storing actual column values, they consume at least as
- much memory as the size of data inserted. However, there is also overhead 
- associated with the structures used to index this data. When the
- number of columns and rows is high compared to the size of values, this
- overhead can become quite significant, (possibly greater than the data
- itself).
  
+ In other words, which threshold(s) to use, and what to set them to is not just a function
of how much memory you have, but of how many column families, how many columns per column-family,
and the size of values  being stored.
- In other words, which threshold(s) to use, and what to set them to is
- not just a function of how much memory you have, but of how many column
- families, how many columns per column-family, and the size of values 
- being stored.
  
- Listed below are the thresholds found in `storage-conf.xml`, along with a
+ Listed below are the thresholds found in `storage-conf.xml`, along with a description.
- description.
  
  === MemtableSizeInMB ===
+ As the name indicates, this sets the max size in megabytes that the  Memtable will store
before triggering a threshold violation and causing it to be flushed to disk. It corresponds
to the size of the values inserted, (plus the size of the containing column).
- As the name indicates, this sets the max size in megabytes that the 
- Memtable will store before triggering a threshold violation and causing
- it to be flushed to disk. It corresponds to the size of the values
- inserted, (plus the size of the containing column).
  
  If left unconfigured (missing from the config), this defaults to 128MB.
  
  ''Note: The value is applied on a per column-family basis.''
  
  === MemtableObjectCountInMillions ===
- This directive sets a threshold on the number of columns stored. 
+ This directive sets a threshold on the number of columns stored.
  
- Left unconfigured (missing from the config), this defaults to 1 
+ Left unconfigured (missing from the config), this defaults to 1  (or 1,000,000 objects).
- (or 1,000,000 objects).
  
  ''Note: The value is applied on a per column-family basis.''
  
  == Using Jconsole To Optimize Thresholds ==
+ Cassandra's column-family mbeans have a number of attributes that can prove invaluable in
determining optimal thresholds. One way to access this instrumentation is by using Jconsole,
a graphical monitoring and management application that ships with your JDK.
- Cassandra's column-family mbeans have a number of attributes that can
- prove invaluable in determining optimal thresholds. One way to access
- this instrumentation is by using Jconsole, a graphical monitoring and
- management application that ships with your JDK.
  
+ Launching Jconsole with no arguments will display the "New Connection" dialog box. If you
are running Jconsole on the same machine that  Cassandra is running on, then you can connect
using the PID, otherwise you will need to connect remotely. The default startup scripts for
Cassandra cause the VM to listen on port 8080 using the JVM option:
- Launching Jconsole with no arguments will display the "New Connection"
- dialog box. If you are running Jconsole on the same machine that 
- Cassandra is running on, then you can connect using the PID, otherwise
- you will need to connect remotely. The default startup scripts for
- Cassandra cause the VM to listen on port 8080 using the JVM option: 
  
-  -Dcom.sun.management.jmxremote.port=8080
+  . -Dcom.sun.management.jmxremote.port=8080
  
  The remote JMX url is then:
  
  service:jmx:rmi:///jndi/rmi://localhost:8080/jmxrmi
  
- This is used internally by:
- bin/nodetool src/java/org/apache/cassandra/tools/nodetool.java
+ This is used internally by: bin/nodetool src/java/org/apache/cassandra/tools/nodetool.java
  
  {{attachment:jconsole_connect.png}}
  
+ Once connected, select the ''MBeans'' tab, expand the  ''org.apache.cassandra.db'' section,
and finally one of your column families.
- 
- Once connected, select the ''MBeans'' tab, expand the 
- ''org.apache.cassandra.db'' section, and finally one of your column families.
  
  There are three interesting attributes here.
  

Mime
View raw message