Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 50012 invoked from network); 5 Aug 2010 08:43:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Aug 2010 08:43:02 -0000 Received: (qmail 70268 invoked by uid 500); 5 Aug 2010 08:43:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70083 invoked by uid 500); 5 Aug 2010 08:42:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70075 invoked by uid 99); 5 Aug 2010 08:42:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Aug 2010 08:42:57 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=FREEMAIL_FROM,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Aug 2010 08:42:49 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Ogw1u-0003t5-Qn for user@cassandra.apache.org; Thu, 05 Aug 2010 10:42:22 +0200 Received: from 62.141.71.52 ([62.141.71.52]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 05 Aug 2010 10:42:22 +0200 Received: from oleganas by 62.141.71.52 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 05 Aug 2010 10:42:22 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@cassandra.apache.org From: Oleg Anastasjev Subject: Re: Cassandra Scaling Questions Date: Thu, 5 Aug 2010 08:42:14 +0000 (UTC) Lines: 33 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 62.141.71.52 (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8) Gecko/20100723 Ubuntu/10.04 (lucid) Firefox/3.6.8) > > 1.) What have you found to be the best ratio of Cassandra row cache to memory free on the system for filesystem cache?  Are you tuning it like an RDBMS so Cassandra has the vast majority of the RAM in the system or are you letting the filesystem cache do some of the work? This depends on your exact case: how much rows are in a hot set. Throwing too much memory to JVM cache results in slower garbage collection with no effect on performance. There are cases (for ex, large rows, which are read mostly partially using get_slice), for which row cache will do things worse. I did a try and watch approach, changing size of row cache and watching for row cache hit ratio and op/s. Hit ratio of 0.9 was enough for my case. > > 2.) Is the Cassandra cache write-through (ie are new records held in the row cache as they're written to disk? Not exactly. Cassandra keeps recent writes (not rows) in memory, but after flushing memtable, it will reread from disk (and reconstruct) whole row to row cache on 1st read if data. > > 3.) When using the random partitioner how much difference should be expected (or has been observed) between nodes?  2%? 10%? This depends on data. It will distribute keys almost equal between nodes, nut sizes of row data could be different for different keys. In my case it was about 0.2%