incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Effective cache size
Date Fri, 04 Jun 2010 02:15:49 GMT
On Thu, Jun 3, 2010 at 10:17 AM, David King <dking@ketralnis.com> wrote:
>>> So with the row cache, that first node (the primary replica) is the one that
has that row cached, yes?
>> No, it's the closest node as determined by snitch.sortByProximity.
>
> And with the default snitch, rack-unaware placement, random partitioner, and all nodes
up, that's the primary replica, right?

No.  When all replicas have equal weight it's basically random.

>> any given node X will never know whether another node Y has a row cached or not.
 the overhead for communicating that level of detail would be totally prohibitive. all caching
does is speed the read, once a request is received for data local to a given node.  no more,
no less.
>
> Yes, that's my concern, but the details significantly affect the effective size of the
cache (in the afoorementioned case, the details place the effective size at either 6 million
or 18 million, a 3x difference).
>
> So given CL==ONE reads, only the actually read node (which will be the primary replica
given the default placement strategy and snitch) will cache the item, right? The checksum-checking
doesn't cause the row to be cached on the non-read nodes?

You have to read the data, before you can checksum it.  So on the
contrary, digest (checksum) vs data read has no effect on cache
behavior.

> If I read with CL==QUORUM in an RF==3 environment, do both read nodes them cache the
item, or only the primary replica?

Both.  Which is what you want, otherwise your digest reads will cause
substantial unnecessary i/o on hot keys.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message