cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "StorageConfiguration" by tuxracer69
Date Fri, 13 Nov 2009 15:41:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "StorageConfiguration" page has been changed by tuxracer69.
http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=1&rev2=2

--------------------------------------------------

  }}}
  
  == Keyspaces and ColumnFamilies ==
- Keyspaces and !ColumnFamilies: A !ColumnFamily is the Cassandra concept closest to a relational
table.  !Keyspaces are separate groups of !ColumnFamilies.  Except in very unusual circumstances
you will have one Keyspace per application.
+ Keyspaces and {{{ColumnFamilies}}}: A {{{ColumnFamily}}} is the Cassandra concept closest
to a relational table.  {{{Keyspaces}}} are separate groups of {{{ColumnFamilies}}}.  Except
in very unusual circumstances you will have one Keyspace per application.
  
  There is an implicit keyspace named 'system' for Cassandra internals.
  
@@ -21, +21 @@

   <Keyspace Name="Keyspace1">
  }}}
  
- The !CompareWith attribute tells Cassandra how to sort the columns for slicing operations.
 The default is !BytesType, which is a straightforward lexical comparison of the bytes in
each column. Other options are !AsciiType, !UTF8Type, !LexicalUUIDType, !TimeUUIDType, and
!LongType.  You can also specify the fully-qualified class name to a class of your choice
extending org.apache.cassandra.db.marshal.AbstractType.
+ The {{{CompareWith}}} attribute tells Cassandra how to sort the columns for slicing operations.
 The default is {{{BytesType}}}, which is a straightforward lexical comparison of the bytes
in each column. Other options are {{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}},
{{{TimeUUIDType}}}, and {{{LongType}}}.  You can also specify the fully-qualified class name
to a class of your choice extending {{{org.apache.cassandra.db.marshal.AbstractType}}}.
  
- !SuperColumns have a similar !CompareSubcolumnsWith attribute.
+  * {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribute.
+  * {{{BytesType}}}: Simple sort by byte value.  No validation is performed. 
+  * {{{AsciiType}}}: Like {{{BytesType}}}, but validates that the input can be parsed as
US-ASCII.
+  * {{{UTF8Type}}}: A string encoded as UTF8 
+  * {{{LongType}}}: A 64bit long 
+  * {{{LexicalUUIDType}}}: A 128bit UUID, compared lexically (by byte value) 
+  * {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp
  
- BytesType: Simple sort by byte value.  No validation is performed. !AsciiType: Like !BytesType,
but validates that the input can be parsed as US-ASCII.
- 
- UTF8Type: A string encoded as UTF8 !LongType: A 64bit long !LexicalUUIDType: A 128bit UUID,
compared lexically (by byte value) T!imeUUIDType: a 128bit version 1 !UUID, compared by !timestamp
- 
- (To get the closest approximation to 0.3-style !supercolumns, you would use !CompareWith=UTF8Type
!CompareSubcolumnsWith=!LongType.)
+ (To get the closest approximation to 0.3-style {{{supercolumns}}}, you would use {{{CompareWith=UTF8Type
CompareSubcolumnsWith=LongType}}}.)
  
- If !FlushPeriodInMinutes is configured and positive, it will be flushed to disk with that
period whether it is dirty or not.  This is intended for lightly-used !columnfamilies so that
they do not prevent !commitlog segments from being purged.
+ If {{{FlushPeriodInMinutes}}} is configured and positive, it will be flushed to disk with
that period whether it is dirty or not.  This is intended for lightly-used {{{columnfamilies}}}
so that they do not prevent commitlog segments from being purged.
  
  {{{
  <ColumnFamily CompareWith="BytesType"
-  Name="Standard1"
+        Name="Standard1"
-   FlushPeriodInMinutes="60"/>
+        FlushPeriodInMinutes="60"/>
-  <ColumnFamily CompareWith="UTF8Type" Name="Standard2"/> <ColumnFamily CompareWith="TimeUUIDType"
Name="StandardByUUID1"/> <ColumnFamily ColumnType="Super"
- CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" Name="Super1"/>
- </Keyspace>
- </Keyspaces>
+ <ColumnFamily CompareWith="UTF8Type" 
+        Name="Standard2"/> 
+ <ColumnFamily CompareWith="TimeUUIDType" 
+        Name="StandardByUUID1"/> 
+ <ColumnFamily ColumnType="Super"
+        CompareWith="UTF8Type" 
+        CompareSubcolumnsWith="UTF8Type" 
+        Name="Super1"/>
  }}}
  
  == Partitioner ==
- Partitioner: any !IPartitioner may be used, including your own as long as it is on the !classpath.
 Out of the box, Cassandra provides org.apache.cassandra.dht.RandomPartitioner, org.apache.cassandra.dht.OrderPreservingPartitioner,
and org.apache.cassandra.dht.CollatingOrderPreservingPartitioner. (CollatingOPP colates according
to EN,US rules, not naive byte ordering.  Use this as an example if you need locale-aware
collation.) Range queries require using an order-preserving partitioner.
+ Partitioner: any {{{IPartitioner}}} may be used, including your own as long as it is on
the classpath.  Out of the box, Cassandra provides {{{org.apache.cassandra.dht.RandomPartitioner}}},
{{{org.apache.cassandra.dht.OrderPreservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPreservingPartitioner}}}.
(CollatingOPP colates according to EN,US rules, not naive byte ordering.  Use this as an example
if you need locale-aware collation.) Range queries require using an order-preserving partitioner.
  
  Achtung!  Changing this parameter requires wiping your data directories, since the partitioner
can modify the !sstable on-disk format.
  
@@ -56, +62 @@

  
  If you are using an order-preserving partitioner and you know your key distribution, you
can specify the token for this node to use. (Keys are sent to the node with the "closest"
token, so distributing your tokens equally along the key distribution space will spread keys
evenly across your cluster.)  This setting is only checked the first time a node is started.
  
- This can also be useful with RandomPartitioner to force equal spacing of tokens around the
hash space, especially for clusters with a small number of nodes.
+ This can also be useful with {{{RandomPartitioner}}} to force equal spacing of tokens around
the hash space, especially for clusters with a small number of nodes.
  
  {{{
  <InitialToken></InitialToken>
  }}}
  
  == EndPointSnitch ==
- !EndPointSnitch: Setting this to the class that implements !IEndPointSnitch which will see
if two endpoints are in the same data center or on the same rack. Out of the box, Cassandra
provides org.apache.cassandra.locator.EndPointSnitch
+ !EndPointSnitch: Setting this to the class that implements {{{IEndPointSnitch}}} which will
see if two endpoints are in the same data center or on the same rack. Out of the box, Cassandra
provides {{{org.apache.cassandra.locator.EndPointSnitch}}}
  
  {{{
  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
  }}}
  
  == ReplicaPlacementStrategy ==
- Strategy: Setting this to the class that implements IReplicaPlacementStrategy will change
the way the node picker works. Out of the box, Cassandra provides org.apache.cassandra.locator.RackUnawareStrategy
and org.apache.cassandra.locator.RackAwareStrategy (place one replica in a different datacenter,
and the others on different racks in the same one.)
+ Strategy: Setting this to the class that implements {{{IReplicaPlacementStrategy}}} will
change the way the node picker works. Out of the box, Cassandra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}}
and {{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a different
datacenter, and the others on different racks in the same one.)
  
  {{{
  <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
@@ -83, +89 @@

  }}}
  
  == Directories ==
- Directories: Specify where Cassandra should store different data on disk.  Keep the data
disks and the CommitLog disks separate for best performance
+ Directories: Specify where Cassandra should store different data on disk.  Keep the data
disks and the {{{CommitLog}}} disks separate for best performance. See also [[FAQ#what_kind_of_hardware_should_i_use|what
kind of hardware should I use?]]
  
  {{{
- <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> <DataFileDirectories>
+ <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> 
+ <DataFileDirectories>
- <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
+       <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
  </DataFileDirectories> 
  <CalloutLocation>/var/lib/cassandra/callouts</CalloutLocation> <BootstrapFileDirectory>/var/lib/cassandra/bootstrap</BootstrapFileDirectory>
<StagingFileDirectory>/var/lib/cassandra/staging</StagingFileDirectory>
  }}}
@@ -118, +125 @@

  
  Address to bind to and tell other nodes to connect to.  You _must_ change this if you want
multiple nodes to be able to communicate!
  
- Leaving it blank leaves it up to InetAddress.getLocalHost(). This will always do the Right
Thing *if* the node is properly configured (hostname, name resolution, etc), and the Right
Thing is to use the address associated with the hostname (it might not be).
+ Leaving it blank leaves it up to {{{InetAddress.getLocalHost()}}}. This will always do the
Right Thing *if* the node is properly configured (hostname, name resolution, etc), and the
Right Thing is to use the address associated with the hostname (it might not be).
  
  {{{
  <ListenAddress>localhost</ListenAddress> 
@@ -128, +135 @@

  <ControlPort>7001</ControlPort>
  }}}
  
- The address to bind the Thrift RPC service to. Unlike ListenAddress above, you *can* specify
0.0.0.0 here if you want Thrift to listen on all interfaces.
+ The address to bind the Thrift RPC service to. Unlike {{{ListenAddress}}} above, you *can*
specify {{{0.0.0.0}}} here if you want Thrift to listen on all interfaces.
  
- Leaving this blank has the same effect it does for ListenAddress, (i.e. it will be based
on the configured hostname of the node).
+ Leaving this blank has the same effect it does for {{{ListenAddress}}}, (i.e. it will be
based on the configured hostname of the node).
  
  {{{
  <ThriftAddress>localhost</ThriftAddress> 
@@ -164, +171 @@

  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
  }}}
  
- The maximum amount of data to store in memory per !ColumnFamily before flushing to disk.
 Note: There is one memtable per column family, and  this threshold is based solely on the
amount of data stored, not actual heap memory usage (there is some overhead in indexing the
columns).
+ The maximum amount of data to store in memory per !ColumnFamily before flushing to disk.
 Note: There is one memtable per column family, and  this threshold is based solely on the
amount of data stored, not actual heap memory usage (there is some overhead in indexing the
columns). See also [[MemtableThresholds|MemtableThresholds]].
  
  {{{
  <MemtableSizeInMB>64</MemtableSizeInMB>
- }}
+ }}}
  
- The maximum number of columns in millions to store in memory per ColumnFamily before flushing
to disk.  This is also a per-memtable setting.  Use with MemtableSizeInMB to tune memory usage.
+ The maximum number of columns in millions to store in memory per ColumnFamily before flushing
to disk.  This is also a per-memtable setting.  Use with {{{MemtableSizeInMB}}} to tune memory
usage.
  
  {{{
  <MemtableObjectCountInMillions>0.1</MemtableObjectCountInMillions>
  }}}
+ 
  Unlike most systems, in Cassandra writes are faster than reads, so you can afford more of
those in parallel.  A good rule of thumb is 2 concurrent reads per processor core.  Increase
ConcurrentWrites to the number of clients writing at once if you enable CommitLogSync + CommitLogSyncDelay.
  
  {{{
@@ -183, +191 @@

  
  !CommitLogSync may be either "periodic" or "batch."  When in batch mode, Cassandra won't
ack writes until the commit log has been !fsynced to disk.  It will wait up to !CommitLogSyncBatchWindowInMS
milliseconds for other writes, before performing the sync.
  
- This is less necessary in Cassandra than in traditional databases since replication reduces
the odds of losing data from a failure after writing the log entry but before it actually
reaches the disk. So the other option is "timed," where writes may be acked immediately and
the CommitLog is simply synced every CommitLogSyncPeriodInMS milliseconds.
+ This is less necessary in Cassandra than in traditional databases since replication reduces
the odds of losing data from a failure after writing the log entry but before it actually
reaches the disk. So the other option is "timed," where writes may be acked immediately and
the CommitLog is simply synced every {{{CommitLogSyncPeriodInMS}}} milliseconds.
  
  {{{
  <CommitLogSync>periodic</CommitLogSync>
@@ -203, +211 @@

  {{{
  <GCGraceSeconds>864000</GCGraceSeconds>
  }}}
- Number of threads to run when flushing memtables to disk.  Set this to the number of disks
you physically have in your machine allocated for DataDirectory * 2.  If you are planning
to use the Binary Memtable, its recommended to increase the max threads to maintain a higher
quality of service while under load when normal memtables are flushing to disk.
+ Number of threads to run when flushing memtables to disk.  Set this to the number of disks
you physically have in your machine allocated for {{{DataDirectory * 2}}}.  If you are planning
to use the Binary Memtable, its recommended to increase the max threads to maintain a higher
quality of service while under load when normal memtables are flushing to disk.
  
  {{{
  <FlushMinThreads>1</FlushMinThreads> <FlushMaxThreads>1</FlushMaxThreads>

Mime
View raw message