Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 59595 invoked from network); 24 Aug 2010 22:09:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Aug 2010 22:09:41 -0000 Received: (qmail 2991 invoked by uid 500); 24 Aug 2010 22:09:41 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 2972 invoked by uid 500); 24 Aug 2010 22:09:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 2964 invoked by uid 500); 24 Aug 2010 22:09:40 -0000 Delivered-To: apmail-incubator-cassandra-commits@incubator.apache.org Received: (qmail 2961 invoked by uid 99); 24 Aug 2010 22:09:40 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 22:09:40 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 22:09:22 +0000 Received: from eosnew.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id F33E19F8; Tue, 24 Aug 2010 22:08:58 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 24 Aug 2010 22:08:58 -0000 Message-ID: <20100824220858.64081.12963@eosnew.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22StorageConfiguration=22_by_Jo?= =?utf-8?q?nHermes?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "StorageConfiguration" page has been changed by JonHermes. http://wiki.apache.org/cassandra/StorageConfiguration?action=3Ddiff&rev1=3D= 28&rev2=3D29 -------------------------------------------------- {{{ bin/schematool HOST PORT import }}} + = + = + = =3D Config Overview =3D Not going to cover every value, just the interesting ones. When in doubt,= check out the comments on the default cassandra.yaml as they're well docum= ented there. = =3D=3D per-Cluster (Global) Settings =3D=3D - =3D=3D=3D authenticator =3D=3D=3D + * authenticator Allows for pluggable authentication of users, which defines whether it is= necessary to call the Thrift 'login' method, and which parameters are requ= ired to login. The default '!AllowAllAuthenticator' does not require users = to call 'login': any user can perform any operation. The other built in opt= ion is '!SimpleAuthenticator', which requires users and passwords to be def= ined in property files, and for users to call login with a valid combo. = Default is: 'org.apache.cassandra.auth.AllowAllAuthenticator', a no-op. = - =3D=3D=3D auto_bootstrap =3D=3D=3D + * auto_bootstrap Set to 'true' to make new [non-seed] nodes automatically migrate the righ= t data to themselves. (If no InitialToken is specified, they will pick one= such that they will get half the range of the most-loaded node.) If a nod= e starts up without bootstrapping, it will mark itself bootstrapped so that= you can't subsequently accidently bootstrap a node with data on it. (You = can reset this by wiping your data and commitlog directories.) = Off by default so that new clusters don't bootstrap immediately. You sho= uld turn this on when you start adding new nodes to a cluster that already = has data on it. = - =3D=3D=3D cluster_name =3D=3D=3D + * cluster_name The name of this cluster. This is mainly used to prevent machines in one= logical cluster from joining another. = - =3D=3D=3D commitlog_directory =3D=3D=3D + * commitlog_directory and data_file_directories /var/lib/cassandra/commitlog = + * concurrent_reads and concurrent_writes - =3D=3D=3D concurrent_reads =3D=3D=3D - =3D=3D=3D concurrent_writes =3D=3D=3D 8 - = 32 = - =3D=3D=3D disk_access_mode =3D=3D=3D + * disk_access_mode auto, mmap, mmap_index_only, standard = - =3D=3D=3D dynamic_snitch =3D=3D=3D + * dynamic_snitch and endpoint_snitch - false + false. = - = - =3D=3D=3D endpoint_snitch =3D=3D=3D !EndPointSnitch: Setting this to the class that implements {{{IEndPointSn= itch}}} which will see if two endpoints are in the same data center or on t= he same rack. Out of the box, Cassandra provides {{{org.apache.cassandra.lo= cator.RackInferringSnitch}}} = Note: this class will work on hosts' IPs only. There is no configuration = parameter to tell Cassandra that a node is in rack ''R'' and in datacenter = ''D''. The current rules are based on the two methods: @@ -60, +59 @@ = * isInSameDataCenter: Look at the IP Address of the two hosts. Compare t= he 2nd octet. If they are the same then the hosts are in the same datacente= r else different datacenter. = + * memtable_flush_after_mins, memtable_operations_in_millions, and memtab= le_throughput_in_mb - =3D=3D=3D memtable_flush_after_mins =3D=3D=3D - =3D=3D=3D memtable_operations_in_millions =3D=3D=3D - =3D=3D=3D memtable_throughput_in_mb =3D=3D=3D 60 0.3 64 = - =3D=3D=3D partitioner =3D=3D=3D - org.apache.cassandra.dht.RandomPartitioner + * partitioner + Partitioner: any {{{IPartitioner}}} may be used, including your own as lo= ng as it is on the classpath. Out of the box, Cassandra provides {{{org.ap= ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP= reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres= ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n= aive byte ordering. Use this as an example if you need locale-aware collat= ion.) Range queries require using an order-preserving partitioner. = + Achtung! Changing this parameter requires wiping your data directories, = since the partitioner can modify the !sstable on-disk format. - =3D=3D=3D rpc_timeout_in_ms =3D=3D=3D - 10000 = - =3D=3D=3D seeds =3D=3D=3D + If you are using an order-preserving partitioner and you know your key di= stribution, you can specify the token for this node to use. (Keys are sent = to the node with the "closest" token, so distributing your tokens equally a= long the key distribution space will spread keys evenly across your cluster= .) This setting is only checked the first time a node is started. + = + This can also be useful with {{{RandomPartitioner}}} to force equal spaci= ng of tokens around the hash space, especially for clusters with a small nu= mber of nodes. + = + Cassandra uses MD5 hash internally to hash the keys to place on the ring = in a {{{RandomPartitioner}}}. So it makes sense to divide the hash space eq= ually by the number of machines available using {{{InitialToken}}} ie, If t= here are 10 machines, each will handle 1/10th of maximum hash value) and ex= pect that the machines will get a reasonably equal load. + = + With {{{OrderPreservingPartitioner}}} the keys themselves are used to pla= ce on the ring. One of the potential drawback of this approach is that if r= ows are inserted with sequential keys, all the write load will go to the sa= me node. + = + * seeds Never use a node's own address as a seed if you are bootstrapping it by s= etting autobootstrap to true! = - =3D=3D=3D thrift_framed_transport_size_in_mb =3D=3D=3D + * thrift_framed_transport_size_in_mb 15 by default. Setting this to 0 is how to denote using unframed transpor= t. = =3D=3D per-Keyspace Settings =3D=3D + * replica_placement_strategy and replication_factor =3D=3D=3D + Strategy: Setting this to the class that implements {{{IReplicaPlacementS= trategy}}} will change the way the node picker works. Out of the box, Cassa= ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{= {org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a = different datacenter, and the others on different racks in the same one.) + = + Note that the replication factor (RF) is the ''total'' number of nodes on= to which the data will be placed. So, a replication factor of 1 means that= only 1 node will have the data. It does '''not''' mean that one ''other''= node will have the data. + = =3D=3D per-ColumnFamily Settings =3D=3D - =3D=3D per-Column Settings =3D=3D + * comment and name + You can describe a ColumnFamily in plain text by setting this property. = + * compare_with - = - = - = - =3D=3D Keyspaces and ColumnFamilies =3D=3D - Keyspaces and {{{ColumnFamilies}}}: A {{{ColumnFamily}}} is the Cassandra= concept closest to a relational table. {{{Keyspaces}}} are separate group= s of {{{ColumnFamilies}}}. Except in very unusual circumstances you will h= ave one Keyspace per application. - = - There is an implicit keyspace named 'system' for Cassandra internals. - = - {{{ - - - }}} - ''[New in 0.5:'' - = - The fraction of keys per sstable whose locations we keep in memory in "mo= stly LRU" order. (JUST the key locations, NOT any column values.) The amou= nt of memory used by the default setting of 0.01 is comparable to the amoun= t used by the internal per-sstable key index. Consider increasing this if y= ou have fewer, wider rows. Set to 0 to disable entirely. - = - {{{ - 0.01 - }}} - '']'' - = - ''[New in 0.6: !EndPointSnitch, !ReplicaPlacementStrategy and !Replicatio= nFactor became configurable per keyspace. Prior to that they were global s= ettings.]'' - = - =3D=3D=3D ReplicaPlacementStrategy and ReplicationFactor =3D=3D=3D - Strategy: Setting this to the class that implements {{{IReplicaPlacementS= trategy}}} will change the way the node picker works. Out of the box, Cassa= ndra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{= {org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a = different datacenter, and the others on different racks in the same one.) - = - {{{ - org.apache.cassandra.locator.RackUnawareStrateg= y - }}} - Number of replicas of the data - = - {{{ - 1 - }}} - Note that the replication factor (RF) is the ''total'' number of nodes on= to which the data will be placed. So, a replication factor of 1 means that= only 1 node will have the data. It does '''not''' mean that one ''other''= node will have the data. - = - =3D=3D=3D ColumnFamilies =3D=3D=3D The {{{CompareWith}}} attribute tells Cassandra how to sort the columns f= or slicing operations. The default is {{{BytesType}}}, which is a straight= forward lexical comparison of the bytes in each column. Other options are {= {{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, = and {{{LongType}}}. You can also specify the fully-qualified class name to= a class of your choice extending {{{org.apache.cassandra.db.marshal.Abstra= ctType}}}. = * {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribut= e. @@ -130, +104 @@ = (To get the closest approximation to 0.3-style {{{supercolumns}}}, you wo= uld use {{{CompareWith=3DUTF8Type CompareSubcolumnsWith=3DLongType}}}.) = - If {{{FlushPeriodInMinutes}}} is configured and positive, it will be flus= hed to disk with that period whether it is dirty or not. This is intended = for lightly-used {{{columnfamilies}}} so that they do not prevent commitlog= segments from being purged. + * gc_grace_seconds + * keys_cached and rows_cached + * preload_row_cache + * read_repair_chance + * default_validation_class = - ''[New in 0.5:'' An optional `Comment` attribute may be used to attach ad= ditional human-readable information about the column family to its definiti= on. '']'' + =3D=3D per-Column Settings =3D=3D + * validation_class + * index_type = + = - {{{ - - - - - }}} =3D=3D Partitioner =3D=3D - Partitioner: any {{{IPartitioner}}} may be used, including your own as lo= ng as it is on the classpath. Out of the box, Cassandra provides {{{org.ap= ache.cassandra.dht.RandomPartitioner}}}, {{{org.apache.cassandra.dht.OrderP= reservingPartitioner}}}, and {{{org.apache.cassandra.dht.CollatingOrderPres= ervingPartitioner}}}. (CollatingOPP colates according to EN,US rules, not n= aive byte ordering. Use this as an example if you need locale-aware collat= ion.) Range queries require using an order-preserving partitioner. - = - Achtung! Changing this parameter requires wiping your data directories, = since the partitioner can modify the !sstable on-disk format. - = - Example: - = - {{{ - org.apache.cassandra.dht.RandomPartitioner - }}} - If you are using an order-preserving partitioner and you know your key di= stribution, you can specify the token for this node to use. (Keys are sent = to the node with the "closest" token, so distributing your tokens equally a= long the key distribution space will spread keys evenly across your cluster= .) This setting is only checked the first time a node is started. - = - This can also be useful with {{{RandomPartitioner}}} to force equal spaci= ng of tokens around the hash space, especially for clusters with a small nu= mber of nodes. - = - {{{ - - }}} - Cassandra uses MD5 hash internally to hash the keys to place on the ring = in a {{{RandomPartitioner}}}. So it makes sense to divide the hash space eq= ually by the number of machines available using {{{InitialToken}}} ie, If t= here are 10 machines, each will handle 1/10th of maximum hash value) and ex= pect that the machines will get a reasonably equal load. - = - With {{{OrderPreservingPartitioner}}} the keys themselves are used to pla= ce on the ring. One of the potential drawback of this approach is that if r= ows are inserted with sequential keys, all the write load will go to the sa= me node. - = - = =3D=3D Miscellaneous =3D=3D Time to wait for a reply from other nodes before failing the command =20