Wouldn't it be the case that the once-used rows in your batch process would quickly be traded out of the cache, and replaced by frequently-used rows? This would be the case even if your batch process goes on for a long time, since caching is done on a row-by-row basis. In effect, it would mean that part of your cache is taken up by the batch process, much as if you dedicated a permanent cache to the batch - except that it isn't permanent, so it's better!


On Mon, May 2, 2011 at 7:50 AM, Tyler Hobbs <tyler@datastax.com> wrote:
If you had one big cache, wouldn't it be the case that it's mostly populated with frequently accessed rows, and less populated with rarely accessed rows?

Yes.

In fact, wouldn't one big cache dynamically and automatically give you exactly what you want? If you try to partition the same amount of memory manually, by guesswork, among many tables, aren't you always going to do a worse job?

Suppose you have one CF that's used constantly through interaction by users.  Suppose you have another CF that's only used periodically by a batch process, you tend to access most or all of the rows during the batch process, and it's too large to cache all of the rows.  Normally, you would dedicate cache space to the first CF as anything with human interaction tends to have good temporal locality and you want to keep latencies there low.  On the other hand, caching the second CF provides little to no real benefit.  When you combine these two CFs, every time your batch process runs, rows from the second CF will populate the cache and will cause eviction of rows from the first CF, even though having those rows in the cache provides little benefit to you.

As another example, if you mix a CF with wide rows and a CF with small rows, you no longer have the option of using a row cache, even if it makes great sense for the small-row CF data.

Knowledge of data and access patterns gives you a very good advantage when it comes to caching your data effectively.


--
Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library