cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Wang <jw...@palantir.com>
Subject dropped mutations, UnavailableException, and long GC
Date Thu, 24 Feb 2011 22:21:28 GMT
Hey all,

Our setup is 5 machines running Cassandra 0.7.0 with 24GB of heap and 1.5TB disk each collocated
in a DC. We're doing bulk imports from each of the nodes with RF = 2 and write consistency
ANY (write perf is very important). The behavior we're seeing is this:


-          Nodes often see each other as dead even though none of the nodes actually go down.
I suspect this may be due to long GCs. It seems like increasing the RPC timeout could help
this, but I'm not convinced this is the root of the problem. Note that in this case writes
return with the UnavailableException.

-          As mentioned, long GCs. We see the ParNew GC doing a lot of smaller collections
(few hundred MB) which are very fast (few hundred ms), but every once in a while the ConcurrentMarkSweep
will take a LONG time (up to 15 min!) to collect upwards of 15GB at once.

-          On some nodes, we see a lot of pending MutationStages build up (e.g. 500K), which
leads to the messages "Dropped X MUTATION messages in the last 5000ms," presumably meaning
that Cassandra has decided to not write one of the replicas of the data. This is not a HUGE
deal, but is less than ideal.

-          The end result is that a bunch of writes end up failing due to the UnavailableExceptions,
so not all of our data is getting into Cassandra.

So my question is: what is the best way to avoid this behavior? Our memtable thresholds are
fairly low (256MB) so there should be plenty of heap space to work with. We may experiment
with write consistency ONE or ALL to see if the perf hit is not too bad, but I wanted to get
some opinions on why this might be happening. Thanks!

-Jeffrey


Mime
View raw message