incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ ...@dude.podzone.net>
Subject Re: Anyone using Facebook's flashcache?
Date Wed, 13 Jul 2011 04:50:54 GMT
On 7/12/2011 9:02 PM, Peter Schuller wrote:
>> Thanks Peter, but... hmmm, are you saying that even after a cache miss which
>> results in a disk read and blocks being moved to the ssd, that by the next
>> cache miss for the same data and subsequent same file blocks, that the ssd
>> is unlikely to have those same blocks present anymore?
> I am saying that regardless of whether the cache is memory, ssd, a
> combination of both, or anything else, most workloads tend to be
> subject to diminishing returns. Doubling cache from 5 gb to 10 gb
> might get you from 10% to 50% cache hit ratio, but doubling again to
> 20 gb might get you to 60% and doubling to 40 gig to 65% (to use some
> completely arbitrary random numbers for demonstration purposes).
>
> The reason a cache can be more effective than the ratio of its size
> vs. the total data set, is that there is a hotspot/working set that is
> smaller than the total data set. If you have completely random access
> this won't be the case, and an cache of size n% of total size will
> give you a n% cache hit ratio.
>
> But for most workloads, you have a hotter working set so you get more
> bang for the buck when caching. For example, if 99% of all accesses
> are accessing 10% of the data, then a cache that is the size of 10% of
> the data gets you 99% cache hit ratio. But clearly no matter how much
> more cache you ever add, you will never ever cache more than 100% of
> reads so in this (artificial arbitrary) scenario, once you're caching
> 10% of your data the cost of cachine the final small percent of
> accesses might be 10 times that of the original cache.
>
> I did a quick Google but didn't find a good piece describing it more
> properly, but hopefully the above is helpful. Some related reading
> might be http://en.wikipedia.org/wiki/Long_Tail
>

Of course.  Thanks for the clarification.  On the positive side, this 
flashcache and other solutions like it could be beneficial for all disk 
i/o on the system.  Writes will always benefit.  Reads, only if they are 
read again before being pushed out by other reads.  I wonder if it would 
help to "prime" the ssd by reading in (and discarding) the top 25% 
(250/1000GB) of the usual hot data.

aj

Mime
View raw message