incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew from Zhrodague <drewzhroda...@zhrodague.net>
Subject Compaction, Slow Ring, and bad behavior
Date Mon, 29 Apr 2013 17:33:17 GMT
	Hi, we have a 9-node ring on m1.xlarge AWS hosts. We started having 
some trouble a while ago, and it's making me pull out all of my hair.

	The host in position #3 has been replaced 4 times. Each time, the host 
joins the ring, I do a nodetool repair -pr, and she seems fine for about 
a day. Then she gets real slow, sometimes OOMs, sometimes takes down the 
host in position #5, sometimes gets stuck on a compaction with near-idle 
disk throughput, and eventually dies without any kind of error message 
or reason for failing.

	Sometimes our cluster gets so slow that it is almost unusable - we get 
timeout errors from our application, AWS sends us voluminous alerts 
about latency.

	I've tried changing the amount of RAM between 8G and 12G, changing the 
MAX_HEAP_SIZE and HEAP_NEWSIZE, repeatedly forcing a stop compaction, 
setting astronomical ulimit values, and praying to available gods. I'm a 
bit confused. We're not using super-wide rows, most things are default.

	EL5, Cassandra 1.1.9, Java 1.6.0


-- 

Drew from Zhrodague
lolcat divinator
drew@zhrodague.net

Mime
View raw message