On 7/12/2011 9:02 PM, Peter Schuller wrote: >> Thanks Peter, but... hmmm, are you saying that even after a cache miss which >> results in a disk read and blocks being moved to the ssd, that by the next >> cache miss for the same data and subsequent same file blocks, that the ssd >> is unlikely to have those same blocks present anymore? > I am saying that regardless of whether the cache is memory, ssd, a > combination of both, or anything else, most workloads tend to be > subject to diminishing returns. Doubling cache from 5 gb to 10 gb > might get you from 10% to 50% cache hit ratio, but doubling again to > 20 gb might get you to 60% and doubling to 40 gig to 65% (to use some > completely arbitrary random numbers for demonstration purposes). > > The reason a cache can be more effective than the ratio of its size > vs. the total data set, is that there is a hotspot/working set that is > smaller than the total data set. If you have completely random access > this won't be the case, and an cache of size n% of total size will > give you a n% cache hit ratio. > > But for most workloads, you have a hotter working set so you get more > bang for the buck when caching. For example, if 99% of all accesses > are accessing 10% of the data, then a cache that is the size of 10% of > the data gets you 99% cache hit ratio. But clearly no matter how much > more cache you ever add, you will never ever cache more than 100% of > reads so in this (artificial arbitrary) scenario, once you're caching > 10% of your data the cost of cachine the final small percent of > accesses might be 10 times that of the original cache. > > I did a quick Google but didn't find a good piece describing it more > properly, but hopefully the above is helpful. Some related reading > might be http://en.wikipedia.org/wiki/Long_Tail > Of course. Thanks for the clarification. On the positive side, this flashcache and other solutions like it could be beneficial for all disk i/o on the system. Writes will always benefit. Reads, only if they are read again before being pushed out by other reads. I wonder if it would help to "prime" the ssd by reading in (and discarding) the top 25% (250/1000GB) of the usual hot data. aj