cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "StorageConfiguration" by JonHermes
Date Tue, 24 Aug 2010 23:09:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "StorageConfiguration" page has been changed by JonHermes.
http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=32&rev2=33

--------------------------------------------------

  Default is: 'localhost'. This must be changed for other nodes to contact this node.
  
   * memtable_flush_after_mins, memtable_operations_in_millions, and memtable_throughput_in_mb
+ The maximum time to leave a dirty memtable unflushed. (While any affected columnfamilies
have unflushed data from a commit log segment, that segment cannot be deleted.) This needs
to be large enough that it won't cause a flush storm of all your memtables flushing at once
because none has hit the size or count thresholds yet.  For production, a larger value such
as 1440 is recommended.
+ 
+ The maximum number of columns in millions to store in memory per ColumnFamily before flushing
to disk.  This is also a per-memtable setting.  Use with {{{MemtableSizeInMB}}} to tune memory
usage.
+ 
+ The maximum amount of data to store in memory per !ColumnFamily before flushing to disk.
 Note: There is one memtable per column family, and  this threshold is based solely on the
amount of data stored, not actual heap memory usage (there is some overhead in indexing the
columns). See also MemtableThresholds.
  
  Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively.
  
@@ -108, +113 @@

  
  Note that the replication factor (RF) is the ''total'' number of nodes onto which the data
will be placed.  So, a replication factor of 1 means that only 1 node will have the data.
 It does '''not''' mean that one ''other'' node will have the data.
  
- Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF of at least
2 is highly recommended, keeping in mind that your effective number of nodes is N / RF.
+ Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF of at least
2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes
/ RF).
  
  == per-ColumnFamily Settings ==
    * comment and name
@@ -126, +131 @@

   a. {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp
  
    * gc_grace_seconds
+ Time to wait before garbage-collection deletion markers.  Set this to a large enough value
that you are confident that the deletion marker will be propagated to all replicas by the
time this many seconds has elapsed, even in the face of hardware failures.  The default value
is ten days.
+ 
+ Default is: '864000' seconds, or 10 days.
  
    * keys_cached and rows_cached
  
@@ -141, +149 @@

    * index_type
  
  
- The ControlPort setting is deprecated in 0.6 and can be safely removed from configuration.
  
- {{{
- <ListenAddress>localhost</ListenAddress>
- <!-- TCP port, for commands and data -->
- <StoragePort>7000</StoragePort>
- <!-- UDP port, for membership communications (gossip) -->
- <ControlPort>7001</ControlPort>
- }}}
- The address to bind the Thrift RPC service to. Unlike {{{ListenAddress}}} above, you *can*
specify {{{0.0.0.0}}} here if you want Thrift to listen on all interfaces.
  
- Leaving this blank has the same effect it does for {{{ListenAddress}}}, (i.e. it will be
based on the configured hostname of the node).
- 
- {{{
- <ThriftAddress>localhost</ThriftAddress>
- <!-- Thrift RPC port (the port clients connect to). -->
- <ThriftPort>9160</ThriftPort>
- }}}
- Whether or not to use a framed transport for Thrift. If this option is set to true then
you must also use a framed transport on the  client-side, (framed and non-framed transports
are not compatible).
- 
- {{{
- <ThriftFramedTransport>false</ThriftFramedTransport>
- }}}
- == Memory, Disk, and Performance ==
- Access mode.  
- {{{
- <DiskAccessMode>auto</DiskAccessMode>
- }}}
- Buffer size to use when performing contiguous column slices. Increase this to the size of
the column slices you typically perform.  (Name-based queries are performed with a buffer
size of  !ColumnIndexSizeInKB.)
- 
- {{{
- <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
- }}}
- Buffer size to use when flushing !memtables to disk. (Only one  !memtable is ever flushed
at a time.) Increase (decrease) the index buffer size relative to the data buffer if you have
few (many)  columns per key.  Bigger is only better _if_ your !memtables get large enough
to use the space. (Check in your data directory after your app has been running long enough.)
- 
- {{{
- <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
- <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
- }}}
  Add column indexes to a row after its contents reach this size. Increase if your column
values are large, or if you have a very large number of columns.  The competing causes are,
Cassandra has to deserialize this much of the row to read a single column, so you want it
to be small - at least if you do many partial-row reads - but all the index data is read for
each access, so you don't want to generate that wastefully either.
  
  {{{
  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
  }}}
- The maximum amount of data to store in memory per !ColumnFamily before flushing to disk.
 Note: There is one memtable per column family, and  this threshold is based solely on the
amount of data stored, not actual heap memory usage (there is some overhead in indexing the
columns). See also MemtableThresholds.
- 
- {{{
- <MemtableSizeInMB>64</MemtableSizeInMB>
- }}}
- The maximum number of columns in millions to store in memory per ColumnFamily before flushing
to disk.  This is also a per-memtable setting.  Use with {{{MemtableSizeInMB}}} to tune memory
usage.
- 
- {{{
- <MemtableObjectCountInMillions>0.1</MemtableObjectCountInMillions>
- }}}
- ''[New in 0.5''
- 
- The maximum time to leave a dirty memtable unflushed. (While any affected columnfamilies
have unflushed data from a commit log segment, that segment cannot be deleted.) This needs
to be large enough that it won't cause a flush storm of all your memtables flushing at once
because none has hit the size or count thresholds yet.  For production, a larger value such
as 1440 is recommended.
- 
- {{{
-   <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
- }}}
  '']''
  
- Time to wait before garbage-collection deletion markers.  Set this to a large enough value
that you are confident that the deletion marker will be propagated to all replicas by the
time this many seconds has elapsed, even in the face of hardware failures.  The default value
is ten days.
- 
- {{{
- <GCGraceSeconds>864000</GCGraceSeconds>
- }}}
- Number of threads to run when flushing memtables to disk.  Set this to the number of disks
you physically have in your machine allocated for {{{DataDirectory * 2}}}.  If you are planning
to use the Binary Memtable, its recommended to increase the max threads to maintain a higher
quality of service while under load when normal memtables are flushing to disk.
- 
- {{{
- <FlushMinThreads>1</FlushMinThreads>
- <FlushMaxThreads>1</FlushMaxThreads>
- }}}
- The threshold size in megabytes the binary memtable must grow to, before it's submitted
for flushing to disk.
- 
- {{{
- <BinaryMemtableSizeInMB>256</BinaryMemtableSizeInMB>
- }}}
- 

Mime
View raw message