Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D621433F for ; Mon, 9 May 2011 16:21:07 +0000 (UTC) Received: (qmail 49660 invoked by uid 500); 9 May 2011 16:21:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 49641 invoked by uid 500); 9 May 2011 16:21:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 49633 invoked by uid 99); 9 May 2011 16:21:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2011 16:21:05 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of izquierdo@strands.com designates 217.116.18.226 as permitted sender) Received: from [217.116.18.226] (HELO mail.strands.com) (217.116.18.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2011 16:20:56 +0000 Received: from localhost (localhost [127.0.0.1]) by mail.strands.com (Postfix) with ESMTP id 2BFE3302FBD for ; Mon, 9 May 2011 18:20:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at strands.com X-Spam-Score: -9.5 X-Spam-Level: Received: from mail.strands.com ([127.0.0.1]) by localhost (mail.strands.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8wvA8u0A0bND for ; Mon, 9 May 2011 18:20:35 +0200 (CEST) Received: from [172.16.180.64] (unknown [109.70.35.10]) (using SSLv3 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: izquierdo) by mail.strands.com (Postfix) with ESMTPSA id 5B43F302FAD for ; Mon, 9 May 2011 18:20:32 +0200 (CEST) Subject: Re: Index interval tuning From: =?ISO-8859-1?Q?H=E9ctor?= Izquierdo Seliva To: user@cassandra.apache.org In-Reply-To: References: <1304956189.2567.5.camel@mierdi-laptop> Content-Type: text/plain; charset="UTF-8" Date: Mon, 09 May 2011 18:20:30 +0200 Message-ID: <1304958030.2567.11.camel@mierdi-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-9.5 required=5.1 tests=[ALL_TRUSTED=-5.5, BAYES_00=-4] autolearn=ham El lun, 09-05-2011 a las 17:58 +0200, Peter Schuller escribió: > > I have a few sstables with around 500 million keys, and memory usage has > > grown a lot, I suppose because of the indexes. This sstables are > > comprised of skinny rows, but a lot of them. Would tuning index interval > > make the memory usage go down? And what would the performance hit be? > > Assuming no row caching, and assuming you're talking about heap usage > and not the virtual size of the process in top, the primary two things > that will grow with row count are (1) bloom filters for sstables and > (2) the sampled index keys. Bloom filters are of a certain size to > achieve a sufficiently small false positive rate. That target rate > could be increased to allow smaller bloom filters, but that is not > exposed as a configuration option and would require code changes. > No row cache and no key cache. I've tried with both, but the keys being read are constantly changing, and I didn't see hit ratios beyond 0.8 %. That reminds me, my false positive ration is stuck at 1.0, so I guess bloom filters aren't doing a lot for me. > For key sampling, the primary performance penalty should be CPU and > maybe some disk. On average, when looking up a key an sstable index > file, you'll read sample interval/2 entries and deserialize them > before finding the one you're after. Increasing sampling interval will > thus increase the amount of deserialization taking place, as well as > make the average range of data span additional pages on disk. The > impact on disk is difficult to judge and likely depends a lot on i/o > scheduling and other details. > So the only thing I can do is test it and see how it goes. To make the change affective, should I do anything beyond changing the value in cassandra.yaml and restart the node? I'll try first with 256 and see what happens.