I had this happen when I had really poorly generated tokens for the ring. Cassandra seems to accept numbers that are too big. You get hot spots when you think you should be balanced and repair never ends (I think there is a 48 hour timeout).
On Tuesday, April 10, 2012, Frank Ng wrote:
I am not using tier-sized compaction.On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone <firstname.lastname@example.org> wrote:Data size, number of nodes, RF?Are you using size-tiered compaction on any of the column families that hold a lot of your data?Do your cassandra logs say you are streaming a lot of ranges?zgrep -E "(Performing streaming repair|out of sync)"
--On Tue, Apr 10, 2012 at 9:45 AM, Igor <email@example.com> wrote:
On 04/10/2012 07:16 PM, Frank Ng wrote:
Short answer - yes.
But you are asking wrong question.
I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this?
On Tue, Apr 10, 2012 at 10:42 AM, Igor <firstname.lastname@example.org> wrote:
You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats.
On 04/10/2012 05:16 PM, Frank Ng wrote:
I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system.