Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 86806 invoked from network); 24 Aug 2010 23:17:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Aug 2010 23:17:18 -0000 Received: (qmail 78417 invoked by uid 500); 24 Aug 2010 23:17:18 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 78396 invoked by uid 500); 24 Aug 2010 23:17:17 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 78388 invoked by uid 500); 24 Aug 2010 23:17:17 -0000 Delivered-To: apmail-incubator-cassandra-commits@incubator.apache.org Received: (qmail 78385 invoked by uid 99); 24 Aug 2010 23:17:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 23:17:17 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 23:17:16 +0000 Received: from eosnew.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 0361D9FD; Tue, 24 Aug 2010 23:16:55 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 24 Aug 2010 23:16:54 -0000 Message-ID: <20100824231654.39625.54226@eosnew.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22StorageConfiguration=22_by_Jo?= =?utf-8?q?nHermes?= Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "StorageConfiguration" page has been changed by JonHermes. http://wiki.apache.org/cassandra/StorageConfiguration?action=3Ddiff&rev1=3D= 33&rev2=3D34 -------------------------------------------------- Not going to cover every value, just the interesting ones. When in doubt,= check out the comments on the default cassandra.yaml as they're well docum= ented there. = =3D=3D per-Cluster (Global) Settings =3D=3D - * authenticator + * '''authenticator''' Allows for pluggable authentication of users, which defines whether it is= necessary to call the Thrift 'login' method, and which parameters are requ= ired to login. The default '!AllowAllAuthenticator' does not require users = to call 'login': any user can perform any operation. The other built in opt= ion is '!SimpleAuthenticator', which requires users and passwords to be def= ined in property files, and for users to call login with a valid combo. = Default is: 'org.apache.cassandra.auth.AllowAllAuthenticator', a no-op. = - * auto_bootstrap + * '''auto_bootstrap''' Set to 'true' to make new [non-seed] nodes automatically migrate the righ= t data to themselves. (If no InitialToken is specified, they will pick one= such that they will get half the range of the most-loaded node.) If a nod= e starts up without bootstrapping, it will mark itself bootstrapped so that= you can't subsequently accidently bootstrap a node with data on it. (You = can reset this by wiping your data and commitlog directories.) = Default is: 'false', so that new clusters don't bootstrap immediately. Y= ou should turn this on when you start adding new nodes to a cluster that al= ready has data on it. = - * cluster_name + * '''cluster_name''' The name of this cluster. This is mainly used to prevent machines in one= logical cluster from joining another. = - * commitlog_directory and data_file_directories + * '''commitlog_directory and data_file_directories''' - /var/lib/cassandra/commitlog + Be sure to seperate your commitlog and data disks, as commitlog performan= ce is reliant on its append-only nature, and seeking to random data at the = same time will damage write speed. = + Defaults are: '/var/lib/cassandra/commitlog' and '/var/lib/cassandra/data= '. + = - * concurrent_reads and concurrent_writes, commitlog_sync and commitlog_s= ync_period_in_ms + * '''concurrent_reads''' and '''concurrent_writes''', '''commitlog_sync'= '' and '''commitlog_sync_period_in_ms''' Unlike most systems, in Cassandra writes are faster than reads, so you ca= n afford more of those in parallel. A good rule of thumb is 4 concurrent_r= eads per processor core. Increase {{{concurrent_writes}}} to the number of= clients writing at once if you use commitlog_sync. = {{{CommitLogSync}}} may be either "periodic" or "batch." When in batch m= ode, Cassandra won't ack writes until the commit log has been fsynced to di= sk. It will wait up to {{{CommitLogSyncBatchWindowInMS}}} milliseconds for= other writes, before performing the sync. @@ -51, +53 @@ = Defaults are: '8' c. reads, '32' c. writes, 'periodic' sync, '10000' ms b= etween syncs. = - * disk_access_mode + * '''disk_access_mode''' The options are: 'auto', 'mmap', 'mmap_index_only', and 'standard'. mmapped i/o is substantially faster, but only practical on a 64bit machin= e (which notably does not include EC2 "small" instances) or relatively smal= l datasets. "auto", the safe choice, will enable mmapping on a 64bit JVM. = Other values are "mmap", "mmap_index_only" (which may allow you to get par= t of the benefits of mmap on a 32bit machine by mmapping only index files) = and "standard". (The buffer size settings that follow only apply to standar= d, non-mmapped i/o.) = Default is: 'auto'. = - * dynamic_snitch and endpoint_snitch + * '''dynamic_snitch''' and '''endpoint_snitch''' !EndPointSnitch: Setting this to the class that implements {{{IEndPointSn= itch}}} which will see if two endpoints are in the same data center or on t= he same rack. Out of the box, Cassandra provides {{{org.apache.cassandra.lo= cator.RackInferringSnitch}}} = Note: this class will work on hosts' IPs only. There is no configuration = parameter to tell Cassandra that a node is in rack ''R'' and in datacenter = ''D''. The current rules are based on the two methods: @@ -70, +72 @@ = Defaults are: 'org.apache.cassandra.locator.SimpleSnitch' and 'false'. = - * listen_address + * '''listen_address''' Commenting out this property leaves it up to {{{InetAddress.getLocalHost(= )}}}. This will always do the Right Thing *if* the node is properly configu= red (hostname, name resolution, etc), and the Right Thing is to use the add= ress associated with the hostname (it might not be). = = Default is: 'localhost'. This must be changed for other nodes to contact = this node. = - * memtable_flush_after_mins, memtable_operations_in_millions, and memtab= le_throughput_in_mb + * '''memtable_flush_after_mins''', '''memtable_operations_in_millions'''= , and '''memtable_throughput_in_mb''' The maximum time to leave a dirty memtable unflushed. (While any affected= columnfamilies have unflushed data from a commit log segment, that segment= cannot be deleted.) This needs to be large enough that it won't cause a fl= ush storm of all your memtables flushing at once because none has hit the s= ize or count thresholds yet. For production, a larger value such as 1440 i= s recommended. = The maximum number of columns in millions to store in memory per ColumnFa= mily before flushing to disk. This is also a per-memtable setting. Use wi= th {{{MemtableSizeInMB}}} to tune memory usage. @@ -84, +86 @@ = Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively. = - * partitioner + * '''partitioner''' Partitioner: any {{{IPartitioner}}} may be used, including your own as lo= ng as it is on the classpath. Out of the box, Cassandra provides {{{org.ap= ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP= reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres= ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n= aive byte ordering. Use this as an example if you need locale-aware collat= ion.) Range queries require using an order-preserving partitioner. = Achtung! Changing this parameter requires wiping your data directories, = since the partitioner can modify the !sstable on-disk format. @@ -99, +101 @@ = Default is: 'org.apache.cassandra.dht.RandomPartitioner'. Manually assign= ing tokens is highly recommended to guarantee even load distribution. = - * seeds + * '''seeds''' Never use a node's own address as a seed if you are bootstrapping it by s= etting autobootstrap to true! = - * thrift_framed_transport_size_in_mb + * '''thrift_framed_transport_size_in_mb''' Setting this to '0' is how to denote using unframed (Buffered) transport. = Default is: '15' mb. = =3D=3D per-Keyspace Settings =3D=3D + * '''name''' + Required field. Will not allow you to use dashes. - * replica_placement_strategy and replication_factor =3D=3D=3D + * '''replica_placement_strategy''' and '''replication_factor''' =3D=3D= =3D Strategy: Setting this to the class that implements {{{IReplicaPlacementS= trategy}}} will change the way the node picker works. Out of the box, Cassa= ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{= {org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a = different datacenter, and the others on different racks in the same one.) = Note that the replication factor (RF) is the ''total'' number of nodes on= to which the data will be placed. So, a replication factor of 1 means that= only 1 node will have the data. It does '''not''' mean that one ''other''= node will have the data. @@ -116, +120 @@ Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'.= RF of at least 2 is highly recommended, keeping in mind that your effectiv= e number of nodes is (N total nodes / RF). = =3D=3D per-ColumnFamily Settings =3D=3D - * comment and name + * '''comment''' and '''name''' You can describe a ColumnFamily in plain text by setting these properties. = - * compare_with + * '''compare_with''' The {{{CompareWith}}} attribute tells Cassandra how to sort the columns f= or slicing operations. The default is {{{BytesType}}}, which is a straight= forward lexical comparison of the bytes in each column. Other options are {= {{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, = and {{{LongType}}}. You can also specify the fully-qualified class name to= a class of your choice extending {{{org.apache.cassandra.db.marshal.Abstra= ctType}}}. = a. {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribu= te. @@ -130, +134 @@ a. {{{LexicalUUIDType}}}: A 128bit UUID, compared lexically (by byte val= ue) a. {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp = + = - * gc_grace_seconds + * '''gc_grace_seconds''' Time to wait before garbage-collection deletion markers. Set this to a l= arge enough value that you are confident that the deletion marker will be p= ropagated to all replicas by the time this many seconds has elapsed, even i= n the face of hardware failures. The default value is ten days. = Default is: '864000' seconds, or 10 days. = - * keys_cached and rows_cached + * '''keys_cached''' and '''rows_cached''' + Defaults are: '200000' keys cached, and '0', disabled row cache. = - * preload_row_cache + * '''preload_row_cache''' = - * read_repair_chance + * '''read_repair_chance''' = - * default_validation_class + * '''default_validation_class''' + Used in conjunction with the validation_class property in the per-column = settings to guarantee the = + = + Default is: 'BytesType', a no-op. = =3D=3D per-Column Settings =3D=3D - * validation_class + * '''validation_class''' = - * index_type + * '''index_type''' = = =20