> > What if we have 10B rows in the column family? What sort of index do you
> use
> > that would only require one iop to find the row index block?
>
> basically what is described in sections 5.3 and 5.4 here:
> http://labs.google.com/papers/bigtable.html
>
Incorrect. Section 4 of the paper describes the SStable indexing strategy.
I have to admit, I'm surprised "3 iops [per mumbleVariable]" is a
satisfactory answer to this question. What should interest any realworld
user of the system would be mumbleVariable.
The 3 part is nearly meaningless. It's the constant you throw away in a
complexity analysis.
So let's come up with a definition for worstcase reads (aka "number of
SStables that might be relevent for a particular key, and what sort of slow
operations might need to be executed to get to the final and proper
SStable") shall we? (This is not a complexity analysis, but a similar
methodology)
mumbleVariable = (((numRows / rowsPerSStable) + bloomFalsePositives +
tombstones ) * 3) disk seeks plus one network operation.
We might shorten those variables for readability:
Mv = (((N/R)+B+T)*3)+Nw
With this formula, we can already begin to formulate more useful answers to
the question. If I have 10B rows in my CF, and I can fit 10k rows per
SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
filter false positive and 1 tombstone and ask the wrong node for the key,
then:
Mv = (((2B/10k)+1+1)*3)+1 == ((200,000)+2)*3+1 == 300,007 iops to read a
key.
More updates and deletes will cause more bloomFalsePositives and tombstones,
right?
Now that I've done this analysis, I think network operations can be thrown
away, since there will only ever be 1 (or maybe 2?) per request, not
depending on numRows. What might also be interesting is an averagecase
analysis, where we give weightings to bloomFalsePositives (which should be
very rare events) and then try to include them, since they should be
proportional to numRows.
Is there some indexing method I'm missing in there? It seems doing 300k iops
to find a key in a large CF like that would be excessive.

timeless(ness)
