I should have done more research before asking the question. I mean real research, too :)

I did a before repair, after repair, and after scrub cfstat. On a hunch I also did a before/after repair but with no scrub - instead I left the cluster alone for the length of time that a scrub normally takes (which can be hours on our dataset). It turns out that in all probability it's just a waiting game. The bloom filter stats were relatively identical at the end of the equivalent time period, as was the query performance. I guess I just needed to wait longer for the streamed files to "settle" or some such.


On Fri, Sep 28, 2012 at 7:20 AM, Charles Brophy <cbrophy@zulily.com> wrote:
Odd indeed.

1) It is observable after the compactions are through and the system has "settled"
2) We're using SizeTiered strategy
3) CentOS 6 & Oracle JVM 1.6.31

I'll do a repair and get some before/after stats to answer your remaining questions.

Thanks Aaron

On Wed, Sep 26, 2012 at 2:51 PM, aaron morton <aaron@thelastpickle.com> wrote:
Sounds very odd. 

Is read performance degrading _after_ repair and compactions that normally result have completed ? 
What Compaction Strategy ?
What OS and JVM ? 

What are are the bloom filter false positive stats from cf stats ?

Do you have some read latency numbers from cfstats ?
Also, could you take a look at cfhistograms  ? 


Aaron Morton
Freelance Developer

On 26/09/2012, at 3:05 AM, Charles Brophy <cbrophy@zulily.com> wrote:

Hey guys,

I've begun to notice that read operations take a performance nose-dive after a standard (full) repair of a fairly large column family: ~11 million records. Interestingly, I've then noticed that read performance returns to normal after a full scrub of the column family. Is it possible that the repair operation is not correctly establishing the bloom filter afterwards? I've noticed an interesting note of the scrub operation is that it will "rebuild sstables with correct bloom filters" which is what is leading me to this conclusion. Does this make sense?

I'm using 1.1.3 and Oracle JDK 1.6.31
The column family is a stanard type and I've noticed this exact behavior regardless of the key/column/value serializers in use.