Return-Path: Delivered-To: apmail-incubator-cassandra-commits-archive@minotaur.apache.org Received: (qmail 13832 invoked from network); 13 Nov 2009 15:41:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Nov 2009 15:41:47 -0000 Received: (qmail 80100 invoked by uid 500); 13 Nov 2009 15:41:47 -0000 Delivered-To: apmail-incubator-cassandra-commits-archive@incubator.apache.org Received: (qmail 80083 invoked by uid 500); 13 Nov 2009 15:41:47 -0000 Mailing-List: contact cassandra-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-commits@incubator.apache.org Received: (qmail 80073 invoked by uid 99); 13 Nov 2009 15:41:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 15:41:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 15:41:43 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 856E517D1A for ; Fri, 13 Nov 2009 15:41:22 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Fri, 13 Nov 2009 15:41:22 -0000 Message-ID: <20091113154122.3748.83618@eos.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22StorageConfiguration=22_by_tu?= =?utf-8?q?xracer69?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "StorageConfiguration" page has been changed by tuxracer69. http://wiki.apache.org/cassandra/StorageConfiguration?action=3Ddiff&rev1=3D= 1&rev2=3D2 -------------------------------------------------- }}} = =3D=3D Keyspaces and ColumnFamilies =3D=3D - Keyspaces and !ColumnFamilies: A !ColumnFamily is the Cassandra concept c= losest to a relational table. !Keyspaces are separate groups of !ColumnFam= ilies. Except in very unusual circumstances you will have one Keyspace per= application. + Keyspaces and {{{ColumnFamilies}}}: A {{{ColumnFamily}}} is the Cassandra= concept closest to a relational table. {{{Keyspaces}}} are separate group= s of {{{ColumnFamilies}}}. Except in very unusual circumstances you will h= ave one Keyspace per application. = There is an implicit keyspace named 'system' for Cassandra internals. = @@ -21, +21 @@ }}} = - The !CompareWith attribute tells Cassandra how to sort the columns for sl= icing operations. The default is !BytesType, which is a straightforward le= xical comparison of the bytes in each column. Other options are !AsciiType,= !UTF8Type, !LexicalUUIDType, !TimeUUIDType, and !LongType. You can also s= pecify the fully-qualified class name to a class of your choice extending o= rg.apache.cassandra.db.marshal.AbstractType. + The {{{CompareWith}}} attribute tells Cassandra how to sort the columns f= or slicing operations. The default is {{{BytesType}}}, which is a straight= forward lexical comparison of the bytes in each column. Other options are {= {{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, = and {{{LongType}}}. You can also specify the fully-qualified class name to= a class of your choice extending {{{org.apache.cassandra.db.marshal.Abstra= ctType}}}. = - !SuperColumns have a similar !CompareSubcolumnsWith attribute. + * {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribut= e. + * {{{BytesType}}}: Simple sort by byte value. No validation is performe= d. = + * {{{AsciiType}}}: Like {{{BytesType}}}, but validates that the input ca= n be parsed as US-ASCII. + * {{{UTF8Type}}}: A string encoded as UTF8 = + * {{{LongType}}}: A 64bit long = + * {{{LexicalUUIDType}}}: A 128bit UUID, compared lexically (by byte valu= e) = + * {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp = - BytesType: Simple sort by byte value. No validation is performed. !Ascii= Type: Like !BytesType, but validates that the input can be parsed as US-ASC= II. - = - UTF8Type: A string encoded as UTF8 !LongType: A 64bit long !LexicalUUIDTy= pe: A 128bit UUID, compared lexically (by byte value) T!imeUUIDType: a 128b= it version 1 !UUID, compared by !timestamp - = - (To get the closest approximation to 0.3-style !supercolumns, you would u= se !CompareWith=3DUTF8Type !CompareSubcolumnsWith=3D!LongType.) + (To get the closest approximation to 0.3-style {{{supercolumns}}}, you wo= uld use {{{CompareWith=3DUTF8Type CompareSubcolumnsWith=3DLongType}}}.) = - If !FlushPeriodInMinutes is configured and positive, it will be flushed t= o disk with that period whether it is dirty or not. This is intended for l= ightly-used !columnfamilies so that they do not prevent !commitlog segments= from being purged. + If {{{FlushPeriodInMinutes}}} is configured and positive, it will be flus= hed to disk with that period whether it is dirty or not. This is intended = for lightly-used {{{columnfamilies}}} so that they do not prevent commitlog= segments from being purged. = {{{ + FlushPeriodInMinutes=3D"60"/> - - - + = + = + }}} = =3D=3D Partitioner =3D=3D - Partitioner: any !IPartitioner may be used, including your own as long as= it is on the !classpath. Out of the box, Cassandra provides org.apache.ca= ssandra.dht.RandomPartitioner, org.apache.cassandra.dht.OrderPreservingPart= itioner, and org.apache.cassandra.dht.CollatingOrderPreservingPartitioner. = (CollatingOPP colates according to EN,US rules, not naive byte ordering. U= se this as an example if you need locale-aware collation.) Range queries re= quire using an order-preserving partitioner. + Partitioner: any {{{IPartitioner}}} may be used, including your own as lo= ng as it is on the classpath. Out of the box, Cassandra provides {{{org.ap= ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP= reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres= ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n= aive byte ordering. Use this as an example if you need locale-aware collat= ion.) Range queries require using an order-preserving partitioner. = Achtung! Changing this parameter requires wiping your data directories, = since the partitioner can modify the !sstable on-disk format. = @@ -56, +62 @@ = If you are using an order-preserving partitioner and you know your key di= stribution, you can specify the token for this node to use. (Keys are sent = to the node with the "closest" token, so distributing your tokens equally a= long the key distribution space will spread keys evenly across your cluster= .) This setting is only checked the first time a node is started. = - This can also be useful with RandomPartitioner to force equal spacing of = tokens around the hash space, especially for clusters with a small number o= f nodes. + This can also be useful with {{{RandomPartitioner}}} to force equal spaci= ng of tokens around the hash space, especially for clusters with a small nu= mber of nodes. = {{{ }}} = =3D=3D EndPointSnitch =3D=3D - !EndPointSnitch: Setting this to the class that implements !IEndPointSnit= ch which will see if two endpoints are in the same data center or on the sa= me rack. Out of the box, Cassandra provides org.apache.cassandra.locator.En= dPointSnitch + !EndPointSnitch: Setting this to the class that implements {{{IEndPointSn= itch}}} which will see if two endpoints are in the same data center or on t= he same rack. Out of the box, Cassandra provides {{{org.apache.cassandra.lo= cator.EndPointSnitch}}} = {{{ org.apache.cassandra.locator.EndPointSnitch }}} = =3D=3D ReplicaPlacementStrategy =3D=3D - Strategy: Setting this to the class that implements IReplicaPlacementStra= tegy will change the way the node picker works. Out of the box, Cassandra p= rovides org.apache.cassandra.locator.RackUnawareStrategy and org.apache.cas= sandra.locator.RackAwareStrategy (place one replica in a different datacent= er, and the others on different racks in the same one.) + Strategy: Setting this to the class that implements {{{IReplicaPlacementS= trategy}}} will change the way the node picker works. Out of the box, Cassa= ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{= {org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a = different datacenter, and the others on different racks in the same one.) = {{{ org.apache.cassandra.locator.RackUnawareStrateg= y @@ -83, +89 @@ }}} = =3D=3D Directories =3D=3D - Directories: Specify where Cassandra should store different data on disk.= Keep the data disks and the CommitLog disks separate for best performance + Directories: Specify where Cassandra should store different data on disk.= Keep the data disks and the {{{CommitLog}}} disks separate for best perfo= rmance. See also [[FAQ#what_kind_of_hardware_should_i_use|what kind of hard= ware should I use?]] = {{{ - /var/lib/cassandra/commitlog + /var/lib/cassandra/commitlog = + - /var/lib/cassandra/data + /var/lib/cassandra/data = /var/lib/cassandra/callouts /var/lib/cassandra/bootstrap /var/lib/cassandra/staging }}} @@ -118, +125 @@ = Address to bind to and tell other nodes to connect to. You _must_ change= this if you want multiple nodes to be able to communicate! = - Leaving it blank leaves it up to InetAddress.getLocalHost(). This will al= ways do the Right Thing *if* the node is properly configured (hostname, nam= e resolution, etc), and the Right Thing is to use the address associated wi= th the hostname (it might not be). + Leaving it blank leaves it up to {{{InetAddress.getLocalHost()}}}. This w= ill always do the Right Thing *if* the node is properly configured (hostnam= e, name resolution, etc), and the Right Thing is to use the address associa= ted with the hostname (it might not be). = {{{ localhost = @@ -128, +135 @@ 7001 }}} = - The address to bind the Thrift RPC service to. Unlike ListenAddress above= , you *can* specify 0.0.0.0 here if you want Thrift to listen on all interf= aces. + The address to bind the Thrift RPC service to. Unlike {{{ListenAddress}}}= above, you *can* specify {{{0.0.0.0}}} here if you want Thrift to listen o= n all interfaces. = - Leaving this blank has the same effect it does for ListenAddress, (i.e. i= t will be based on the configured hostname of the node). + Leaving this blank has the same effect it does for {{{ListenAddress}}}, (= i.e. it will be based on the configured hostname of the node). = {{{ localhost = @@ -164, +171 @@ 64 }}} = - The maximum amount of data to store in memory per !ColumnFamily before fl= ushing to disk. Note: There is one memtable per column family, and this t= hreshold is based solely on the amount of data stored, not actual heap memo= ry usage (there is some overhead in indexing the columns). + The maximum amount of data to store in memory per !ColumnFamily before fl= ushing to disk. Note: There is one memtable per column family, and this t= hreshold is based solely on the amount of data stored, not actual heap memo= ry usage (there is some overhead in indexing the columns). See also [[Memta= bleThresholds|MemtableThresholds]]. = {{{ 64 - }} + }}} = - The maximum number of columns in millions to store in memory per ColumnFa= mily before flushing to disk. This is also a per-memtable setting. Use wi= th MemtableSizeInMB to tune memory usage. + The maximum number of columns in millions to store in memory per ColumnFa= mily before flushing to disk. This is also a per-memtable setting. Use wi= th {{{MemtableSizeInMB}}} to tune memory usage. = {{{ 0.1 }}} + = Unlike most systems, in Cassandra writes are faster than reads, so you ca= n afford more of those in parallel. A good rule of thumb is 2 concurrent r= eads per processor core. Increase ConcurrentWrites to the number of client= s writing at once if you enable CommitLogSync + CommitLogSyncDelay. = {{{ @@ -183, +191 @@ = !CommitLogSync may be either "periodic" or "batch." When in batch mode, = Cassandra won't ack writes until the commit log has been !fsynced to disk. = It will wait up to !CommitLogSyncBatchWindowInMS milliseconds for other wr= ites, before performing the sync. = - This is less necessary in Cassandra than in traditional databases since r= eplication reduces the odds of losing data from a failure after writing the= log entry but before it actually reaches the disk. So the other option is = "timed," where writes may be acked immediately and the CommitLog is simply = synced every CommitLogSyncPeriodInMS milliseconds. + This is less necessary in Cassandra than in traditional databases since r= eplication reduces the odds of losing data from a failure after writing the= log entry but before it actually reaches the disk. So the other option is = "timed," where writes may be acked immediately and the CommitLog is simply = synced every {{{CommitLogSyncPeriodInMS}}} milliseconds. = {{{ periodic @@ -203, +211 @@ {{{ 864000 }}} - Number of threads to run when flushing memtables to disk. Set this to th= e number of disks you physically have in your machine allocated for DataDir= ectory * 2. If you are planning to use the Binary Memtable, its recommende= d to increase the max threads to maintain a higher quality of service while= under load when normal memtables are flushing to disk. + Number of threads to run when flushing memtables to disk. Set this to th= e number of disks you physically have in your machine allocated for {{{Data= Directory * 2}}}. If you are planning to use the Binary Memtable, its reco= mmended to increase the max threads to maintain a higher quality of service= while under load when normal memtables are flushing to disk. = {{{ 1 1