Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Tue, 24 Aug 2010 23:16:54 -0000
Message-ID: <20100824231654.39625.54226@eosnew.apache.org>
Subject: 
 =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22StorageConfiguration=22_by_Jo?=
 =?utf-8?q?nHermes?=

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for=
 change notification.

The "StorageConfiguration" page has been changed by JonHermes.
http://wiki.apache.org/cassandra/StorageConfiguration?action=3Ddiff&rev1=3D=
33&rev2=3D34

--------------------------------------------------

  Not going to cover every value, just the interesting ones. When in doubt,=
 check out the comments on the default cassandra.yaml as they're well docum=
ented there.
  =

  =3D=3D per-Cluster (Global) Settings =3D=3D
-  * authenticator
+  * '''authenticator'''
  Allows for pluggable authentication of users, which defines whether it is=
 necessary to call the Thrift 'login' method, and which parameters are requ=
ired to login. The default '!AllowAllAuthenticator' does not require users =
to call 'login': any user can perform any operation. The other built in opt=
ion is '!SimpleAuthenticator', which requires users and passwords to be def=
ined in property files, and for users to call login with a valid combo.
  =

  Default is: 'org.apache.cassandra.auth.AllowAllAuthenticator', a no-op.
  =

-  * auto_bootstrap
+  * '''auto_bootstrap'''
  Set to 'true' to make new [non-seed] nodes automatically migrate the righ=
t data to themselves.  (If no InitialToken is specified, they will pick one=
  such that they will get half the range of the most-loaded node.) If a nod=
e starts up without bootstrapping, it will mark itself bootstrapped so that=
 you can't subsequently accidently bootstrap a node with data on it.  (You =
can reset this by wiping your data and commitlog directories.)
  =

  Default is: 'false', so that new clusters don't bootstrap immediately.  Y=
ou should turn this on when you start adding new nodes to a cluster that al=
ready has data on it.
  =

-  * cluster_name
+  * '''cluster_name'''
  The name of this cluster.  This is mainly used to prevent machines in one=
 logical cluster from joining another.
  =

-  * commitlog_directory and data_file_directories
+  * '''commitlog_directory and data_file_directories'''
- /var/lib/cassandra/commitlog
+ Be sure to seperate your commitlog and data disks, as commitlog performan=
ce is reliant on its append-only nature, and seeking to random data at the =
same time will damage write speed.
  =

+ Defaults are: '/var/lib/cassandra/commitlog' and '/var/lib/cassandra/data=
'.
+ =

-  * concurrent_reads and concurrent_writes, commitlog_sync and commitlog_s=
ync_period_in_ms
+  * '''concurrent_reads''' and '''concurrent_writes''', '''commitlog_sync'=
'' and '''commitlog_sync_period_in_ms'''
  Unlike most systems, in Cassandra writes are faster than reads, so you ca=
n afford more of those in parallel.  A good rule of thumb is 4 concurrent_r=
eads per processor core.  Increase {{{concurrent_writes}}} to the number of=
 clients writing at once if you use commitlog_sync.
  =

  {{{CommitLogSync}}} may be either "periodic" or "batch."  When in batch m=
ode, Cassandra won't ack writes until the commit log has been fsynced to di=
sk.  It will wait up to {{{CommitLogSyncBatchWindowInMS}}} milliseconds for=
 other writes, before performing the sync.
@@ -51, +53 @@

  =

  Defaults are: '8' c. reads, '32' c. writes, 'periodic' sync, '10000' ms b=
etween syncs.
  =

-  * disk_access_mode
+  * '''disk_access_mode'''
  The options are: 'auto', 'mmap', 'mmap_index_only', and 'standard'.
  mmapped i/o is substantially faster, but only practical on a 64bit machin=
e (which notably does not include EC2 "small" instances) or relatively smal=
l datasets.  "auto", the safe choice, will enable mmapping on a 64bit JVM. =
 Other values are "mmap", "mmap_index_only" (which may allow you to get par=
t of the benefits of mmap on a 32bit machine by mmapping only index files) =
and "standard". (The buffer size settings that follow only apply to standar=
d, non-mmapped i/o.)
  =

  Default is: 'auto'.
  =

-  * dynamic_snitch and endpoint_snitch
+  * '''dynamic_snitch''' and '''endpoint_snitch'''
  !EndPointSnitch: Setting this to the class that implements {{{IEndPointSn=
itch}}} which will see if two endpoints are in the same data center or on t=
he same rack. Out of the box, Cassandra provides {{{org.apache.cassandra.lo=
cator.RackInferringSnitch}}}
  =

  Note: this class will work on hosts' IPs only. There is no configuration =
parameter to tell Cassandra that a node is in rack ''R'' and in datacenter =
''D''. The current rules are based on the two methods:
@@ -70, +72 @@

  =

  Defaults are: 'org.apache.cassandra.locator.SimpleSnitch' and 'false'.
  =

-  * listen_address
+  * '''listen_address'''
  Commenting out this property leaves it up to {{{InetAddress.getLocalHost(=
)}}}. This will always do the Right Thing *if* the node is properly configu=
red (hostname, name resolution, etc), and the Right Thing is to use the add=
ress associated with the hostname (it might not be).  =

  =

  Default is: 'localhost'. This must be changed for other nodes to contact =
this node.
  =

-  * memtable_flush_after_mins, memtable_operations_in_millions, and memtab=
le_throughput_in_mb
+  * '''memtable_flush_after_mins''', '''memtable_operations_in_millions'''=
, and '''memtable_throughput_in_mb'''
  The maximum time to leave a dirty memtable unflushed. (While any affected=
 columnfamilies have unflushed data from a commit log segment, that segment=
 cannot be deleted.) This needs to be large enough that it won't cause a fl=
ush storm of all your memtables flushing at once because none has hit the s=
ize or count thresholds yet.  For production, a larger value such as 1440 i=
s recommended.
  =

  The maximum number of columns in millions to store in memory per ColumnFa=
mily before flushing to disk.  This is also a per-memtable setting.  Use wi=
th {{{MemtableSizeInMB}}} to tune memory usage.
@@ -84, +86 @@

  =

  Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively.
  =

-  * partitioner
+  * '''partitioner'''
  Partitioner: any {{{IPartitioner}}} may be used, including your own as lo=
ng as it is on the classpath.  Out of the box, Cassandra provides {{{org.ap=
ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP=
reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres=
ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n=
aive byte ordering.  Use this as an example if you need locale-aware collat=
ion.) Range queries require using an order-preserving partitioner.
  =

  Achtung!  Changing this parameter requires wiping your data directories, =
since the partitioner can modify the !sstable on-disk format.
@@ -99, +101 @@

  =

  Default is: 'org.apache.cassandra.dht.RandomPartitioner'. Manually assign=
ing tokens is highly recommended to guarantee even load distribution.
  =

-  * seeds
+  * '''seeds'''
  Never use a node's own address as a seed if you are bootstrapping it by s=
etting autobootstrap to true!
  =

-  * thrift_framed_transport_size_in_mb
+  * '''thrift_framed_transport_size_in_mb'''
  Setting this to '0' is how to denote using unframed (Buffered) transport.
  =

  Default is: '15' mb.
  =

  =3D=3D per-Keyspace Settings =3D=3D
+  * '''name'''
+ Required field. Will not allow you to use dashes.
-  * replica_placement_strategy and replication_factor =3D=3D=3D
+  * '''replica_placement_strategy''' and '''replication_factor''' =3D=3D=
=3D
  Strategy: Setting this to the class that implements {{{IReplicaPlacementS=
trategy}}} will change the way the node picker works. Out of the box, Cassa=
ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{=
{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a =
different datacenter, and the others on different racks in the same one.)
  =

  Note that the replication factor (RF) is the ''total'' number of nodes on=
to which the data will be placed.  So, a replication factor of 1 means that=
 only 1 node will have the data.  It does '''not''' mean that one ''other''=
 node will have the data.
@@ -116, +120 @@

  Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'.=
 RF of at least 2 is highly recommended, keeping in mind that your effectiv=
e number of nodes is (N total nodes / RF).
  =

  =3D=3D per-ColumnFamily Settings =3D=3D
-   * comment and name
+   * '''comment''' and '''name'''
  You can describe a ColumnFamily in plain text by setting these properties.
  =

-   * compare_with
+   * '''compare_with'''
  The {{{CompareWith}}} attribute tells Cassandra how to sort the columns f=
or slicing operations.  The default is {{{BytesType}}}, which is a straight=
forward lexical comparison of the bytes in each column. Other options are {=
{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, =
and {{{LongType}}}.  You can also specify the fully-qualified class name to=
 a class of your choice extending {{{org.apache.cassandra.db.marshal.Abstra=
ctType}}}.
  =

   a. {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribu=
te.
@@ -130, +134 @@

   a. {{{LexicalUUIDType}}}: A 128bit UUID, compared lexically (by byte val=
ue)
   a. {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp
  =

+ =

-   * gc_grace_seconds
+   * '''gc_grace_seconds'''
  Time to wait before garbage-collection deletion markers.  Set this to a l=
arge enough value that you are confident that the deletion marker will be p=
ropagated to all replicas by the time this many seconds has elapsed, even i=
n the face of hardware failures.  The default value is ten days.
  =

  Default is: '864000' seconds, or 10 days.
  =

-   * keys_cached and rows_cached
+   * '''keys_cached''' and '''rows_cached'''
+ Defaults are: '200000' keys cached, and '0', disabled row cache.
  =

-   * preload_row_cache
+   * '''preload_row_cache'''
  =

-   * read_repair_chance
+   * '''read_repair_chance'''
  =

-   * default_validation_class
+   * '''default_validation_class'''
+ Used in conjunction with the validation_class property in the per-column =
settings to guarantee the =

+ =

+ Default is: 'BytesType', a no-op.
  =

  =3D=3D per-Column Settings =3D=3D
-   * validation_class
+   * '''validation_class'''
  =

-   * index_type
+   * '''index_type'''
  =

  =

 =20