Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 61236 invoked from network); 2 Feb 2011 23:00:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Feb 2011 23:00:58 -0000 Received: (qmail 4792 invoked by uid 500); 2 Feb 2011 23:00:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4741 invoked by uid 500); 2 Feb 2011 23:00:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4730 invoked by uid 99); 2 Feb 2011 23:00:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Feb 2011 23:00:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Feb 2011 23:00:47 +0000 Received: by ywk9 with SMTP id 9so252568ywk.31 for ; Wed, 02 Feb 2011 15:00:23 -0800 (PST) MIME-Version: 1.0 Received: by 10.150.58.2 with SMTP id g2mr7214808yba.397.1296687623057; Wed, 02 Feb 2011 15:00:23 -0800 (PST) Sender: scode@scode.org Received: by 10.151.79.21 with HTTP; Wed, 2 Feb 2011 15:00:23 -0800 (PST) X-Originating-IP: [213.114.156.79] In-Reply-To: References: Date: Thu, 3 Feb 2011 00:00:23 +0100 X-Google-Sender-Auth: qYnEee62D_gnbMdL56BbiX5dElo Message-ID: Subject: Re: Cassandra memory needs From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > I am trying to understand the relationship between data set/SSTable(s) si= ze and > Cassandra heap. http://wiki.apache.org/cassandra/LargeDataSetConsiderations > For a rough rule of thumb, Cassandra's internal datastructures will requi= re > about =C2=A0memtable_throughput_in_mb * 3 * number of hot CFs + 1G + inte= rnal caches. > > This formula does not depend on the data set size. Does this mean that pr= ovided > Cassandra has sufficient disk space to accommodate growing data set, =C2= =A0it can run > in fixed memory for bulk load? No, for reasons that I hope are covered at the above URL. The calculation you refer to has more to with how you tweak your memtables for performance which is only loosely coupled to data size. The cost of index sampling and bloom filters are very directly related to database size however (see wiki url). It is essentially a trade-off; where a typical b-tree database would simply start demanding additional seeks as the index size grows larger, Cassandra does limit the seeks but instead has a stricter memory requirements. If you're only looking to smack huge amounts of data into the database without every reading them, or reading them very very rarely, it is sub-optimal from a memory perspective. Note though that these are memory requirements "per row key", rather than "per byte of data". >Am I right that memory impact of compacting > increasing SSTAble sizes is capped by a parameter > in_memory_compaction_limit_in_mb? That limits the amount of memory allocated for individual row compactions yes, and will put a cap on the GC pressure generated in addition to allowing huge rows to be compacted independently of heap size. > Q2. What would I need to monitor to predict ahead the need to double the = number > of nodes assuming sufficient storage per node? Is there a simple rule of = thumb > saying that for a heap of size X a node can handle SSTable of size Y? I d= o > realize that the i/o and CPU play a role here but could that be reduced t= o a > factor: Y =3D f(X) * z where z is 1 for a specified server config. I am a= ssuming > random partitioner and a fixed number of write clients. Disregarding memtable tweaking that will have more to do with throughput, the most important factor in terms of scaling memory requirements w.r.t. data size, is the number of row keys and the length of the average row. I recommend just empirically inserting say 10 million rows with realistic row keys and observing the size of the resulting index and bloom filter files. Take into account to what extent compaction will cause memory usage to temporarily spike. Also take into account that if you plan on having very large rows, the indexes will begin having more than one entry per row (see column_index_size_in_kb in the configuration). If your use-case is somehow truly extreme in the sense of huge data sets with little to no requirement on query efficiency, the "per row key" costs can be cut down by adjusting index_interval in the configuration to affect the cost of index sampling, and the target false positive rates of bloom filters could be adjusted (in source, not conf) to cut down on that. But really, that would be an unusual thing to do I think and I wouldn't recommend touching that without careful consideration and deep understanding of your expected use-case. > Q3. Does the formula account for deserialization during reads? What does = 1G > represent? I don't know the background of that particular wiki statement, but my guess is that 1G is just sort of a general gut feel "good to have" base memory size rather than something very specifically calculated. --=20 / Peter Schuller