Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: mibrahim@mibrahim.net
In-Reply-To: <AANLkTimz3ROjS=EJOw2DfGZcZZgZ4FAEsY1dP-K4-t8O@mail.gmail.com>
References: <4C977450.5030908@corp.aol.com>
	<AANLkTimrPfYEUsnUP-MO_JMmYuxGu6U3qKM9v1Pbfja+@mail.gmail.com>
	<4C979A84.20904@corp.aol.com>
	<AANLkTimz3ROjS=EJOw2DfGZcZZgZ4FAEsY1dP-K4-t8O@mail.gmail.com>
Date: Mon, 20 Sep 2010 15:09:09 -0400
Message-ID: <AANLkTikKFn5pZubV_myt8Ycuwpf7A2h54Ce4vpFo6T0G@mail.gmail.com>
Subject: Re: improving read performance
From: Mohamed Ibrahim <mibrahim@clker.com>
To: dev@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016364edbdea2e1750490b5a593

--0016364edbdea2e1750490b5a593
Content-Type: text/plain; charset=UTF-8

Just in case some one uses the equations on that page, there is a small
mathematical mistake. The exponent is missing a -ve sign, so the error rate
is : ( 1 - exp(-kn/m) )^k .

Mohamed

On Mon, Sep 20, 2010 at 3:04 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Actually, the points you make are things I have overlooked and actually
> make
> > me feel more comfortable about how cassandra will perform for my use
> cases.
> >   I'm interested, in my case, to find out what the bloom filter
> > false-positive rate is.   Hopefully, a stat is kept on this.
>
> Assuming lack of implementation bugs and a good enough hash algorithm,
>  the false positive rate of bloom filters are mathematically
> determined. See:
>
>   http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
>
> And in cassandra:
>
>   java/org/apache/cassandra/utils/BloomCalculations.java
>   java/org/apache/cassandra/utils/BloomFilter.java
>
> (I don't know without checking (no time right now) whether the false
> positive rate is actually tracked or not.)
>
> >   As long as
> > ALL of the bloom filters are in memory, the hit should be minimal  for a
>
> Bloom filters are by design in memory at all times (they are the worst
> possible case you can imagine in terms of random access, so it would
> never make sense to keep them on disk even partially).
>
> (This assumes the JVM isn't being otherwise swapped out, which is
> another issue.)
>
> > Good point on the row cache.   I had actually misread the comments in the
> > yaml, mistaking "do not use on ColumnFamilies with LARGE ROWS" , as "do
> not
> > use on ColumnFamilies with a LARGE NUMBER OF ROWS".    I don't know if
> this
> > will improve performance much since I don't understand yet if this
> > eliminates the need to check for the data in the SStables.   If it
> doesn't
> > then what is the point of the row cache since the data is also in an
> > in-memory memtable?
>
> It does eliminate the need to go down to sstables. It also survives
> compactions (so doesn't go cold when sstables are replaced).
>
> Reasons to not use the row cache with large rows include:
>
> * In general it's a waste of memory better given to the OS page cache,
> unless possibly you're continually reading entire rows rather than
> subsets of rows.
>
> * For truly large rows you may have immediate issues with the size of
> the data being cached; e.g. attempting to cache a 2 GB row is not the
> best idea in terms of heap space consumption; you'll likely OOM or
> trigger fallbacks to full GC, etc.
>
> * Having a larger key cache may often be more productive.
>
> > That aside, splitting the memtable in 2, could make checking the bloom
> > filters unnecessary in most cases for me, but I'm not sure it's worth the
> > effort.
>
> Write-through row caching seems like a more direct approach to me
> personally, off hand. Also to the extent that you're worried about
> false positive rates, larger bloom filters may still be an option (not
> currently configurable; would require source changes).
>
> --
> / Peter Schuller
>

--0016364edbdea2e1750490b5a593--