Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 17344 invoked from network); 24 Feb 2011 00:22:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Feb 2011 00:22:16 -0000 Received: (qmail 22486 invoked by uid 500); 24 Feb 2011 00:22:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 21866 invoked by uid 500); 24 Feb 2011 00:22:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 21823 invoked by uid 99); 24 Feb 2011 00:22:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Feb 2011 00:22:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rcoli@digg.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Feb 2011 00:22:04 +0000 Received: by iwl42 with SMTP id 42so11620iwl.31 for ; Wed, 23 Feb 2011 16:21:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.206.8 with SMTP id fs8mr304883ibb.10.1298506903363; Wed, 23 Feb 2011 16:21:43 -0800 (PST) Received: by 10.231.192.80 with HTTP; Wed, 23 Feb 2011 16:21:43 -0800 (PST) In-Reply-To: <1298505882042-6058435.post@n2.nabble.com> References: <1298505882042-6058435.post@n2.nabble.com> Date: Wed, 23 Feb 2011 16:21:43 -0800 Message-ID: Subject: Re: How come key cache increases speed by x4? From: Robert Coli To: user@cassandra.apache.org Cc: buddhasystem , cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Feb 23, 2011 at 4:04 PM, buddhasystem wrote: > Well I know the cache is there for a reason, I just can't explain the factor > of 4 when I run my queries on a hot vs cold cache. My queries are actually a > chain of one on an inverted index, which produces a tuple of keys to be used > in the "main" query. The inverted index query should be downright trivial. > > I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing > something? Why such a large factor? (simplified for discussion purposes, not necessarily exhaustive description of.. ) Path in the cold key cache case : a) check all bloom filters, 1 per sstable in the CF, which is in memory b) read the index file (not in memory) and traverse index for every sstable which returns positive in a) c) read the actual data file once for every sstable Path in the hot key cache case : a) read list of filenames and offsets from key cache b) read the actual data file You will notice that the former involves a lot more seeking than the latter, especially if you have "many" sstables. This seeking almost certainly is the cause of your observed difference. If you graph I/O throughput in the two different cases, you will almost certainly see yourself doing more (slow) I/O in the cold cache case. Memory spent on key cache is usually relatively well spent, for this reason. =Rob