Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Tue, 24 Aug 2010 22:08:58 -0000
Message-ID: <20100824220858.64081.12963@eosnew.apache.org>
Subject: 
 =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22StorageConfiguration=22_by_Jo?=
 =?utf-8?q?nHermes?=

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for=
 change notification.

The "StorageConfiguration" page has been changed by JonHermes.
http://wiki.apache.org/cassandra/StorageConfiguration?action=3Ddiff&rev1=3D=
28&rev2=3D29

--------------------------------------------------

  {{{
  bin/schematool HOST PORT import
  }}}
+ =

+ =

+ =

  =3D Config Overview =3D
  Not going to cover every value, just the interesting ones. When in doubt,=
 check out the comments on the default cassandra.yaml as they're well docum=
ented there.
  =

  =3D=3D per-Cluster (Global) Settings =3D=3D
- =3D=3D=3D authenticator =3D=3D=3D
+  * authenticator
  Allows for pluggable authentication of users, which defines whether it is=
 necessary to call the Thrift 'login' method, and which parameters are requ=
ired to login. The default '!AllowAllAuthenticator' does not require users =
to call 'login': any user can perform any operation. The other built in opt=
ion is '!SimpleAuthenticator', which requires users and passwords to be def=
ined in property files, and for users to call login with a valid combo.
  =

  Default is: 'org.apache.cassandra.auth.AllowAllAuthenticator', a no-op.
  =

- =3D=3D=3D auto_bootstrap =3D=3D=3D
+  * auto_bootstrap
  Set to 'true' to make new [non-seed] nodes automatically migrate the righ=
t data to themselves.  (If no InitialToken is specified, they will pick one=
  such that they will get half the range of the most-loaded node.) If a nod=
e starts up without bootstrapping, it will mark itself bootstrapped so that=
 you can't subsequently accidently bootstrap a node with data on it.  (You =
can reset this by wiping your data and commitlog directories.)
  =

  Off by default so that new clusters don't bootstrap immediately.  You sho=
uld turn this on when you start adding new nodes to a cluster that already =
has data on it.
  =

- =3D=3D=3D cluster_name =3D=3D=3D
+  * cluster_name
  The name of this cluster.  This is mainly used to prevent machines in one=
 logical cluster from joining another.
  =

- =3D=3D=3D commitlog_directory =3D=3D=3D
+  * commitlog_directory and data_file_directories
  /var/lib/cassandra/commitlog
  =

+  * concurrent_reads and concurrent_writes
- =3D=3D=3D concurrent_reads =3D=3D=3D
- =3D=3D=3D concurrent_writes =3D=3D=3D
  8
- =

  32
  =

- =3D=3D=3D disk_access_mode =3D=3D=3D
+  * disk_access_mode
  auto, mmap, mmap_index_only, standard
  =

- =3D=3D=3D dynamic_snitch =3D=3D=3D
+  * dynamic_snitch and endpoint_snitch
- false
+ false. =

- =

- =3D=3D=3D endpoint_snitch =3D=3D=3D
  !EndPointSnitch: Setting this to the class that implements {{{IEndPointSn=
itch}}} which will see if two endpoints are in the same data center or on t=
he same rack. Out of the box, Cassandra provides {{{org.apache.cassandra.lo=
cator.RackInferringSnitch}}}
  =

  Note: this class will work on hosts' IPs only. There is no configuration =
parameter to tell Cassandra that a node is in rack ''R'' and in datacenter =
''D''. The current rules are based on the two methods:
@@ -60, +59 @@

  =

   * isInSameDataCenter: Look at the IP Address of the two hosts. Compare t=
he 2nd octet. If they are the same then the hosts are in the same datacente=
r else different datacenter.
  =

+  * memtable_flush_after_mins, memtable_operations_in_millions, and memtab=
le_throughput_in_mb
- =3D=3D=3D memtable_flush_after_mins =3D=3D=3D
- =3D=3D=3D memtable_operations_in_millions =3D=3D=3D
- =3D=3D=3D memtable_throughput_in_mb =3D=3D=3D
  60 0.3 64
  =

- =3D=3D=3D partitioner =3D=3D=3D
- org.apache.cassandra.dht.RandomPartitioner
+  * partitioner
+ Partitioner: any {{{IPartitioner}}} may be used, including your own as lo=
ng as it is on the classpath.  Out of the box, Cassandra provides {{{org.ap=
ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP=
reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres=
ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n=
aive byte ordering.  Use this as an example if you need locale-aware collat=
ion.) Range queries require using an order-preserving partitioner.
  =

+ Achtung!  Changing this parameter requires wiping your data directories, =
since the partitioner can modify the !sstable on-disk format.
- =3D=3D=3D rpc_timeout_in_ms =3D=3D=3D
- 10000
  =

- =3D=3D=3D seeds =3D=3D=3D
+ If you are using an order-preserving partitioner and you know your key di=
stribution, you can specify the token for this node to use. (Keys are sent =
to the node with the "closest" token, so distributing your tokens equally a=
long the key distribution space will spread keys evenly across your cluster=
.)  This setting is only checked the first time a node is started.
+ =

+ This can also be useful with {{{RandomPartitioner}}} to force equal spaci=
ng of tokens around the hash space, especially for clusters with a small nu=
mber of nodes.
+ =

+ Cassandra uses MD5 hash internally to hash the keys to place on the ring =
in a {{{RandomPartitioner}}}. So it makes sense to divide the hash space eq=
ually by the number of machines available using {{{InitialToken}}} ie, If t=
here are 10 machines, each will handle 1/10th of maximum hash value) and ex=
pect that the machines will get a reasonably equal load.
+ =

+ With {{{OrderPreservingPartitioner}}} the keys themselves are used to pla=
ce on the ring. One of the potential drawback of this approach is that if r=
ows are inserted with sequential keys, all the write load will go to the sa=
me node.
+ =

+  * seeds
  Never use a node's own address as a seed if you are bootstrapping it by s=
etting autobootstrap to true!
  =

- =3D=3D=3D thrift_framed_transport_size_in_mb =3D=3D=3D
+  * thrift_framed_transport_size_in_mb
  15 by default. Setting this to 0 is how to denote using unframed transpor=
t.
  =

  =3D=3D per-Keyspace Settings =3D=3D
+  * replica_placement_strategy and replication_factor =3D=3D=3D
+ Strategy: Setting this to the class that implements {{{IReplicaPlacementS=
trategy}}} will change the way the node picker works. Out of the box, Cassa=
ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{=
{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a =
different datacenter, and the others on different racks in the same one.)
+ =

+ Note that the replication factor (RF) is the ''total'' number of nodes on=
to which the data will be placed.  So, a replication factor of 1 means that=
 only 1 node will have the data.  It does '''not''' mean that one ''other''=
 node will have the data.
+ =

  =3D=3D per-ColumnFamily Settings =3D=3D
- =3D=3D per-Column Settings =3D=3D
+   * comment and name
+ You can describe a ColumnFamily in plain text by setting this property.
  =

+   * compare_with
- =

- =

- =

- =3D=3D Keyspaces and ColumnFamilies =3D=3D
- Keyspaces and {{{ColumnFamilies}}}: A {{{ColumnFamily}}} is the Cassandra=
 concept closest to a relational table.  {{{Keyspaces}}} are separate group=
s of {{{ColumnFamilies}}}.  Except in very unusual circumstances you will h=
ave one Keyspace per application.
- =

- There is an implicit keyspace named 'system' for Cassandra internals.
- =

- {{{
- <Keyspaces>
-  <Keyspace Name=3D"Keyspace1">
- }}}
- ''[New in 0.5:''
- =

- The fraction of keys per sstable whose locations we keep in memory in "mo=
stly LRU" order.  (JUST the key locations, NOT any column values.) The amou=
nt of memory used by the default setting of 0.01 is comparable to the amoun=
t used by the internal per-sstable key index. Consider increasing this if y=
ou have fewer, wider rows. Set to 0 to disable entirely.
- =

- {{{
-       <KeysCachedFraction>0.01</KeysCachedFraction>
- }}}
- '']''
- =

- ''[New in 0.6: !EndPointSnitch, !ReplicaPlacementStrategy and !Replicatio=
nFactor became configurable per keyspace.  Prior to that they were global s=
ettings.]''
- =

- =3D=3D=3D ReplicaPlacementStrategy and ReplicationFactor =3D=3D=3D
- Strategy: Setting this to the class that implements {{{IReplicaPlacementS=
trategy}}} will change the way the node picker works. Out of the box, Cassa=
ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{=
{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a =
different datacenter, and the others on different racks in the same one.)
- =

- {{{
- <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrateg=
y</ReplicaPlacementStrategy>
- }}}
- Number of replicas of the data
- =

- {{{
- <ReplicationFactor>1</ReplicationFactor>
- }}}
- Note that the replication factor (RF) is the ''total'' number of nodes on=
to which the data will be placed.  So, a replication factor of 1 means that=
 only 1 node will have the data.  It does '''not''' mean that one ''other''=
 node will have the data.
- =

- =3D=3D=3D ColumnFamilies =3D=3D=3D
  The {{{CompareWith}}} attribute tells Cassandra how to sort the columns f=
or slicing operations.  The default is {{{BytesType}}}, which is a straight=
forward lexical comparison of the bytes in each column. Other options are {=
{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, =
and {{{LongType}}}.  You can also specify the fully-qualified class name to=
 a class of your choice extending {{{org.apache.cassandra.db.marshal.Abstra=
ctType}}}.
  =

   * {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribut=
e.
@@ -130, +104 @@

  =

  (To get the closest approximation to 0.3-style {{{supercolumns}}}, you wo=
uld use {{{CompareWith=3DUTF8Type CompareSubcolumnsWith=3DLongType}}}.)
  =

- If {{{FlushPeriodInMinutes}}} is configured and positive, it will be flus=
hed to disk with that period whether it is dirty or not.  This is intended =
for lightly-used {{{columnfamilies}}} so that they do not prevent commitlog=
 segments from being purged.
+   * gc_grace_seconds
+   * keys_cached and rows_cached
+   * preload_row_cache
+   * read_repair_chance
+   * default_validation_class
  =

- ''[New in 0.5:'' An optional `Comment` attribute may be used to attach ad=
ditional human-readable information about the column family to its definiti=
on. '']''
+ =3D=3D per-Column Settings =3D=3D
+   * validation_class
+   * index_type
  =

+ =

- {{{
- <ColumnFamily CompareWith=3D"BytesType"
-        Name=3D"Standard1"
-        FlushPeriodInMinutes=3D"60"/>
- <ColumnFamily CompareWith=3D"UTF8Type"
-        Name=3D"Standard2"/>
- <ColumnFamily CompareWith=3D"TimeUUIDType"
-        Name=3D"StandardByUUID1"/>
- <ColumnFamily ColumnType=3D"Super"
-        CompareWith=3D"UTF8Type"
-        CompareSubcolumnsWith=3D"UTF8Type"
-        Name=3D"Super1"
-        Comment=3D"A column family with supercolumns, whose column and sub=
column names are UTF8 strings"/>
- }}}
  =3D=3D Partitioner =3D=3D
- Partitioner: any {{{IPartitioner}}} may be used, including your own as lo=
ng as it is on the classpath.  Out of the box, Cassandra provides {{{org.ap=
ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP=
reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres=
ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n=
aive byte ordering.  Use this as an example if you need locale-aware collat=
ion.) Range queries require using an order-preserving partitioner.
- =

- Achtung!  Changing this parameter requires wiping your data directories, =
since the partitioner can modify the !sstable on-disk format.
- =

- Example:
- =

- {{{
- <Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
- }}}
- If you are using an order-preserving partitioner and you know your key di=
stribution, you can specify the token for this node to use. (Keys are sent =
to the node with the "closest" token, so distributing your tokens equally a=
long the key distribution space will spread keys evenly across your cluster=
.)  This setting is only checked the first time a node is started.
- =

- This can also be useful with {{{RandomPartitioner}}} to force equal spaci=
ng of tokens around the hash space, especially for clusters with a small nu=
mber of nodes.
- =

- {{{
- <InitialToken></InitialToken>
- }}}
- Cassandra uses MD5 hash internally to hash the keys to place on the ring =
in a {{{RandomPartitioner}}}. So it makes sense to divide the hash space eq=
ually by the number of machines available using {{{InitialToken}}} ie, If t=
here are 10 machines, each will handle 1/10th of maximum hash value) and ex=
pect that the machines will get a reasonably equal load.
- =

- With {{{OrderPreservingPartitioner}}} the keys themselves are used to pla=
ce on the ring. One of the potential drawback of this approach is that if r=
ows are inserted with sequential keys, all the write load will go to the sa=
me node.
- =

- =

  =3D=3D Miscellaneous =3D=3D
  Time to wait for a reply from other nodes before failing the command
 =20