incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ingalls <paulinga...@gmail.com>
Subject disappointed
Date Wed, 24 Jul 2013 05:01:16 GMT
I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2 cluster up and
working with my data set for three weeks with no success.  I've been running a 1.1 cluster
for 8 months now with no hiccups, but for me at least 1.2 has been a disaster.  I had high
hopes for leveraging the new features of 1.2, specifically vnodes and collections.   But at
this point I can't release my system into production, and will probably need to find a new
back end.  As a small startup, this could be catastrophic.  I'm mostly mad at myself.  I took
a risk moving to the new tech.  I forgot sometimes when you gamble, you lose.

First, the performance of 1.2.6 was horrible when using collections.  I wasn't able to push
through 500k rows before the cluster became unusable.  With a lot of digging, and way too
much time, I discovered I was hitting a bug that had just been fixed, but was unreleased.
 This scared me, because the release was already at 1.2.6 and I would have expected something
as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before.
 But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able
to get past half a million rows.  

But, then I hit ~4 million rows, and a multitude of problems.  Even with the fix above, I
was still seeing a ton of compactions failing, specifically the ones for large rows.  Not
a single large row will compact, they all assert with the wrong size.  Worse, and this is
what kills the whole thing, I keep hitting a wall with open files, even after dumping the
whole DB, dropping vnodes and trying again.  Seriously, 650k open file descriptors?  When
it hits this limit, the whole DB craps out and is basically unusable.  This isn't that many
rows.  I have close to a half a billion in 1.1…

I'm now at a standstill.  I figure I have two options unless someone here can help me.  Neither
of them involve 1.2.  I can either go back to 1.1 and remove the features that collections
added to my service, or I find another data backend that has similar performance characteristics
to cassandra but allows collections type behavior in a scalable manner.  Cause as far as I
can tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I accomplished with
1.1….

Does anyone know why there are so many open file descriptors?  Any ideas on why a large row
won't compact?

Paul
Mime
View raw message