Return-Path: Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: (qmail 54677 invoked from network); 20 Sep 2010 19:09:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Sep 2010 19:09:38 -0000 Received: (qmail 90418 invoked by uid 500); 20 Sep 2010 19:09:37 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 90404 invoked by uid 500); 20 Sep 2010 19:09:37 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 90396 invoked by uid 99); 20 Sep 2010 19:09:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Sep 2010 19:09:36 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Sep 2010 19:09:32 +0000 Received: by qwc9 with SMTP id 9so4200009qwc.31 for ; Mon, 20 Sep 2010 12:09:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.251.79 with SMTP id mr15mr6452996qcb.140.1285009749237; Mon, 20 Sep 2010 12:09:09 -0700 (PDT) Sender: mibrahim@mibrahim.net Received: by 10.229.213.209 with HTTP; Mon, 20 Sep 2010 12:09:09 -0700 (PDT) In-Reply-To: References: <4C977450.5030908@corp.aol.com> <4C979A84.20904@corp.aol.com> Date: Mon, 20 Sep 2010 15:09:09 -0400 X-Google-Sender-Auth: diLjWDyLwI9maysbfcDAQ9ye4ds Message-ID: Subject: Re: improving read performance From: Mohamed Ibrahim To: dev@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016364edbdea2e1750490b5a593 --0016364edbdea2e1750490b5a593 Content-Type: text/plain; charset=UTF-8 Just in case some one uses the equations on that page, there is a small mathematical mistake. The exponent is missing a -ve sign, so the error rate is : ( 1 - exp(-kn/m) )^k . Mohamed On Mon, Sep 20, 2010 at 3:04 PM, Peter Schuller wrote: > > Actually, the points you make are things I have overlooked and actually > make > > me feel more comfortable about how cassandra will perform for my use > cases. > > I'm interested, in my case, to find out what the bloom filter > > false-positive rate is. Hopefully, a stat is kept on this. > > Assuming lack of implementation bugs and a good enough hash algorithm, > the false positive rate of bloom filters are mathematically > determined. See: > > http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html > > And in cassandra: > > java/org/apache/cassandra/utils/BloomCalculations.java > java/org/apache/cassandra/utils/BloomFilter.java > > (I don't know without checking (no time right now) whether the false > positive rate is actually tracked or not.) > > > As long as > > ALL of the bloom filters are in memory, the hit should be minimal for a > > Bloom filters are by design in memory at all times (they are the worst > possible case you can imagine in terms of random access, so it would > never make sense to keep them on disk even partially). > > (This assumes the JVM isn't being otherwise swapped out, which is > another issue.) > > > Good point on the row cache. I had actually misread the comments in the > > yaml, mistaking "do not use on ColumnFamilies with LARGE ROWS" , as "do > not > > use on ColumnFamilies with a LARGE NUMBER OF ROWS". I don't know if > this > > will improve performance much since I don't understand yet if this > > eliminates the need to check for the data in the SStables. If it > doesn't > > then what is the point of the row cache since the data is also in an > > in-memory memtable? > > It does eliminate the need to go down to sstables. It also survives > compactions (so doesn't go cold when sstables are replaced). > > Reasons to not use the row cache with large rows include: > > * In general it's a waste of memory better given to the OS page cache, > unless possibly you're continually reading entire rows rather than > subsets of rows. > > * For truly large rows you may have immediate issues with the size of > the data being cached; e.g. attempting to cache a 2 GB row is not the > best idea in terms of heap space consumption; you'll likely OOM or > trigger fallbacks to full GC, etc. > > * Having a larger key cache may often be more productive. > > > That aside, splitting the memtable in 2, could make checking the bloom > > filters unnecessary in most cases for me, but I'm not sure it's worth the > > effort. > > Write-through row caching seems like a more direct approach to me > personally, off hand. Also to the extent that you're worried about > false positive rates, larger bloom filters may still be an option (not > currently configurable; would require source changes). > > -- > / Peter Schuller > --0016364edbdea2e1750490b5a593--