> If I have 10B rows in my CF, and I can fit 10k rows per
> > SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
> > filter false positive and 1 tombstone and ask the wrong node for the key,
> > then:
> >
> > Mv = (((2B/10k)+1+1)*3)+1 == ((200,000)+2)*3+1 == 300,007 iops to read a
> > key.
>
> This is a nonsensical arrangement. Assuming each SSTable is the size
> of the default Memtable threshold (128MB), then each row is (128MB /
> 10k) == 12.8k and 10B rows == 128TB of raw data. A typical RF of 3
> takes us to 384TB. The need for enough space for compactions takes us
> to 768TB. That's not 5 nodes, it's more like 100+, and almost 2
> orders of magnitude off your estimate,
You started off so well. You laid out a couple of useful points:
(1) for a 10Brow dataset, 12.8KB rows, RF=3, Cassandra cluster requires
768TB. If you have less, you'll run into severe administration problems.
This is not obvious, but is critical and extremely useful.
(2) 12.8KB rowsize wants a >128MB memtable threshold. Is there a rule of
thumb for this? memTableThreshold = rowsize * 100,000?
without addressing shortcomings
> in the rest of it (which I leave to more capable folks on this list).
>
Then this.
Obviously noone is more capable on this list or there'd already be a good
answer. So stop your whinging and help lay the foundation for a good answer.
We forgot to include replication factor in the calculation. If we do assume
RF=3, that triples the number of SStables that might have the key, right?
That means the new formula is:
readIops = ((numRowsOnNode * replicationFactor / rowsPerSStable)
+ bloomFalsePositives
+ tombstones ) * 3
It seems bloomFalsePositives and tombstones get lost in the noise. Is that a
reasonable assumption? On a volatile dataset, do those grow to significant
sizes? I'm going to assume both=0 for now.
Let's change a couple assumptions to make a more realistic scenario. 1KB row
size, 10B rows, RF=3. That's a 9TB dataset, requiring 27TB*2=54TB of
storage. Our intrepid admin chooses to have 20 nodes of 3TB each.
numRowsOnNode = 10B / 20 = 500M.
replicationFactor = 3.
rowsPerSStable = 128MB / 1K = 131k.
Therefore worstcase iops per read on this cluster are:
(500M * 3 / 131k) * 3 = 150M / 131k = 11,450.
Feel free to point out problems with the methodology, but I humbly suggest
if you do, you also propose a moreprecise formula.

timeless(ness)
