It seems it was linked to "data volume".
The symptoms were :
- a read query is executed on a server (#3 in our case). this query is like select from CF where A= and B= and C= and D= where A..D are secondary indexes.
- #3 CPU increased a lot (CPU load around 20 for 8 cores).
During that time, cluster is still under load
- then a couple of seconds later, server #2 and #4 (next and previous tokens...) are impacted (CPU load around 20 for 8 cores).
- from the client side, we got HTTP time out.
- ReadStage pending was high.
- no IO (disk, network), just CPU bursts
- We stopped the load and after a while (around 10 minute), the cluster calm down.
Indeed the query we executed was reading a lot of data (too much ;) )
What we did (this morning in fact) :
- we precalculated the indexes we need in order to be sure to always have a query like select fom CF where idx= (and use only one index)
- so our data model is composed of columns and "concatened" indexes. these "concatened indexes" are secondary indexes (into cassandra).
In our case and based on our data access, we have 4 precalculated indexes.
It works terribly well (thanks to my colleague Franck for having this idea).
Yet, I would be strongly interested in internals about secondary indexes. What I understood is
- secondary index is local to the seed and index local data only.
- concerning the fault-tolerance, if you have a RF to 3 and lost 2 consecutive seeds / token, you are dead for all the reads you are doing (even if the down seeds do not have the data but the query need to read the index on the down seeds).
Am I right?