cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "StorageConfiguration" by JonHermes
Date Tue, 24 Aug 2010 23:16:54 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "StorageConfiguration" page has been changed by JonHermes.


  Not going to cover every value, just the interesting ones. When in doubt, check out the
comments on the default cassandra.yaml as they're well documented there.
  == per-Cluster (Global) Settings ==
-  * authenticator
+  * '''authenticator'''
  Allows for pluggable authentication of users, which defines whether it is necessary to call
the Thrift 'login' method, and which parameters are required to login. The default '!AllowAllAuthenticator'
does not require users to call 'login': any user can perform any operation. The other built
in option is '!SimpleAuthenticator', which requires users and passwords to be defined in property
files, and for users to call login with a valid combo.
  Default is: 'org.apache.cassandra.auth.AllowAllAuthenticator', a no-op.
-  * auto_bootstrap
+  * '''auto_bootstrap'''
  Set to 'true' to make new [non-seed] nodes automatically migrate the right data to themselves.
 (If no InitialToken is specified, they will pick one  such that they will get half the range
of the most-loaded node.) If a node starts up without bootstrapping, it will mark itself bootstrapped
so that you can't subsequently accidently bootstrap a node with data on it.  (You can reset
this by wiping your data and commitlog directories.)
  Default is: 'false', so that new clusters don't bootstrap immediately.  You should turn
this on when you start adding new nodes to a cluster that already has data on it.
-  * cluster_name
+  * '''cluster_name'''
  The name of this cluster.  This is mainly used to prevent machines in one logical cluster
from joining another.
-  * commitlog_directory and data_file_directories
+  * '''commitlog_directory and data_file_directories'''
- /var/lib/cassandra/commitlog
+ Be sure to seperate your commitlog and data disks, as commitlog performance is reliant on
its append-only nature, and seeking to random data at the same time will damage write speed.
+ Defaults are: '/var/lib/cassandra/commitlog' and '/var/lib/cassandra/data'.
-  * concurrent_reads and concurrent_writes, commitlog_sync and commitlog_sync_period_in_ms
+  * '''concurrent_reads''' and '''concurrent_writes''', '''commitlog_sync''' and '''commitlog_sync_period_in_ms'''
  Unlike most systems, in Cassandra writes are faster than reads, so you can afford more of
those in parallel.  A good rule of thumb is 4 concurrent_reads per processor core.  Increase
{{{concurrent_writes}}} to the number of clients writing at once if you use commitlog_sync.
  {{{CommitLogSync}}} may be either "periodic" or "batch."  When in batch mode, Cassandra
won't ack writes until the commit log has been fsynced to disk.  It will wait up to {{{CommitLogSyncBatchWindowInMS}}}
milliseconds for other writes, before performing the sync.
@@ -51, +53 @@

  Defaults are: '8' c. reads, '32' c. writes, 'periodic' sync, '10000' ms between syncs.
-  * disk_access_mode
+  * '''disk_access_mode'''
  The options are: 'auto', 'mmap', 'mmap_index_only', and 'standard'.
  mmapped i/o is substantially faster, but only practical on a 64bit machine (which notably
does not include EC2 "small" instances) or relatively small datasets.  "auto", the safe choice,
will enable mmapping on a 64bit JVM.  Other values are "mmap", "mmap_index_only" (which may
allow you to get part of the benefits of mmap on a 32bit machine by mmapping only index files)
and "standard". (The buffer size settings that follow only apply to standard, non-mmapped
  Default is: 'auto'.
-  * dynamic_snitch and endpoint_snitch
+  * '''dynamic_snitch''' and '''endpoint_snitch'''
  !EndPointSnitch: Setting this to the class that implements {{{IEndPointSnitch}}} which will
see if two endpoints are in the same data center or on the same rack. Out of the box, Cassandra
provides {{{org.apache.cassandra.locator.RackInferringSnitch}}}
  Note: this class will work on hosts' IPs only. There is no configuration parameter to tell
Cassandra that a node is in rack ''R'' and in datacenter ''D''. The current rules are based
on the two methods:
@@ -70, +72 @@

  Defaults are: 'org.apache.cassandra.locator.SimpleSnitch' and 'false'.
-  * listen_address
+  * '''listen_address'''
  Commenting out this property leaves it up to {{{InetAddress.getLocalHost()}}}. This will
always do the Right Thing *if* the node is properly configured (hostname, name resolution,
etc), and the Right Thing is to use the address associated with the hostname (it might not
  Default is: 'localhost'. This must be changed for other nodes to contact this node.
-  * memtable_flush_after_mins, memtable_operations_in_millions, and memtable_throughput_in_mb
+  * '''memtable_flush_after_mins''', '''memtable_operations_in_millions''', and '''memtable_throughput_in_mb'''
  The maximum time to leave a dirty memtable unflushed. (While any affected columnfamilies
have unflushed data from a commit log segment, that segment cannot be deleted.) This needs
to be large enough that it won't cause a flush storm of all your memtables flushing at once
because none has hit the size or count thresholds yet.  For production, a larger value such
as 1440 is recommended.
  The maximum number of columns in millions to store in memory per ColumnFamily before flushing
to disk.  This is also a per-memtable setting.  Use with {{{MemtableSizeInMB}}} to tune memory
@@ -84, +86 @@

  Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively.
-  * partitioner
+  * '''partitioner'''
  Partitioner: any {{{IPartitioner}}} may be used, including your own as long as it is on
the classpath.  Out of the box, Cassandra provides {{{org.apache.cassandra.dht.RandomPartitioner}}},
{{{org.apache.cassandra.dht.OrderPreservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPreservingPartitioner}}}.
(CollatingOPP colates according to EN,US rules, not naive byte ordering.  Use this as an example
if you need locale-aware collation.) Range queries require using an order-preserving partitioner.
  Achtung!  Changing this parameter requires wiping your data directories, since the partitioner
can modify the !sstable on-disk format.
@@ -99, +101 @@

  Default is: 'org.apache.cassandra.dht.RandomPartitioner'. Manually assigning tokens is highly
recommended to guarantee even load distribution.
-  * seeds
+  * '''seeds'''
  Never use a node's own address as a seed if you are bootstrapping it by setting autobootstrap
to true!
-  * thrift_framed_transport_size_in_mb
+  * '''thrift_framed_transport_size_in_mb'''
  Setting this to '0' is how to denote using unframed (Buffered) transport.
  Default is: '15' mb.
  == per-Keyspace Settings ==
+  * '''name'''
+ Required field. Will not allow you to use dashes.
-  * replica_placement_strategy and replication_factor ===
+  * '''replica_placement_strategy''' and '''replication_factor''' ===
  Strategy: Setting this to the class that implements {{{IReplicaPlacementStrategy}}} will
change the way the node picker works. Out of the box, Cassandra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}}
and {{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a different
datacenter, and the others on different racks in the same one.)
  Note that the replication factor (RF) is the ''total'' number of nodes onto which the data
will be placed.  So, a replication factor of 1 means that only 1 node will have the data.
 It does '''not''' mean that one ''other'' node will have the data.
@@ -116, +120 @@

  Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF of at least
2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes
/ RF).
  == per-ColumnFamily Settings ==
-   * comment and name
+   * '''comment''' and '''name'''
  You can describe a ColumnFamily in plain text by setting these properties.
-   * compare_with
+   * '''compare_with'''
  The {{{CompareWith}}} attribute tells Cassandra how to sort the columns for slicing operations.
 The default is {{{BytesType}}}, which is a straightforward lexical comparison of the bytes
in each column. Other options are {{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}},
{{{TimeUUIDType}}}, and {{{LongType}}}.  You can also specify the fully-qualified class name
to a class of your choice extending {{{org.apache.cassandra.db.marshal.AbstractType}}}.
   a. {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribute.
@@ -130, +134 @@

   a. {{{LexicalUUIDType}}}: A 128bit UUID, compared lexically (by byte value)
   a. {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp
-   * gc_grace_seconds
+   * '''gc_grace_seconds'''
  Time to wait before garbage-collection deletion markers.  Set this to a large enough value
that you are confident that the deletion marker will be propagated to all replicas by the
time this many seconds has elapsed, even in the face of hardware failures.  The default value
is ten days.
  Default is: '864000' seconds, or 10 days.
-   * keys_cached and rows_cached
+   * '''keys_cached''' and '''rows_cached'''
+ Defaults are: '200000' keys cached, and '0', disabled row cache.
-   * preload_row_cache
+   * '''preload_row_cache'''
-   * read_repair_chance
+   * '''read_repair_chance'''
-   * default_validation_class
+   * '''default_validation_class'''
+ Used in conjunction with the validation_class property in the per-column settings to guarantee
+ Default is: 'BytesType', a no-op.
  == per-Column Settings ==
-   * validation_class
+   * '''validation_class'''
-   * index_type
+   * '''index_type'''

View raw message