That is basically our set up. We'll be holding all data on all
My problem was more on how the cache would behave. I thought it
might go this way:
1. No cache hit
Read from 3 nodes to verify results are correct and then return.
Write result into RowCache.
2. Cache hit
Read from Cache directly and return.
If now the value gets updated it would be found in the RowCache
and either invalidated (hence case 1 on next read) or updated
(hence case 2 on next read). However I couldn't find any
information on this.
If this was the case it would mean that each node would only have
to hold 1/5 of my data in Cache (you're right about the DC clone
so 1/5 of data instead of 1/10). If however 3 nodes have to be
read each time and all 3 fill up the row cache with the same data
that would make my cache requirements bigger.
On 10/10/13 14:06, Ken Hancock wrote:
If you're hitting 3/5 nodes, it sounds like you've set
your replication factor to 5. Is that what you're doing so
you can have a 2-node outtage?
For a 5-node cluster, RF=5, each node will have 100% of your
data (a second DC is just a clone), so with a 3GB off-heap it
means that 3GB / <total data size in GB> total would be
cacheable in the row cache.
On the other hand, if you're doing RF=3, each node will have
60% of your data instead of 100% so the effective percentage
of rows that are cache goes up by 66%.
Great quick & dirty caclulator: http://www.ecyrd.com/cassandracalculator/