cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Shook <>
Subject Re: Cassandra behaviour
Date Mon, 26 Jul 2010 16:33:22 GMT
My guess:
Your test is beating up your system. The system may need more memory
or disk throughput or CPU in order to keep up with that particular
Check some of the posts on the list with "deferred processing" in the
body to see why.

Also, can you post the error log?

On Mon, Jul 26, 2010 at 11:23 AM, tsuraan <> wrote:
> I have a system where we're currently using Postgres for all our data
> storage needs, but on a large table the index checks for primary keys
> are really slowing us down on insert.  Cassandra sounds like a good
> alternative (not saying postgres and cassandra are equivalent; just
> that I think they are both reasonable fits for our particular
> product), so I tried running the py_stress tool on a recent repos
> checkout.  I'm using code that's recent enough that it doesn't pay
> attention to the keyspace definitions in cassandra.yaml, so whatever
> the values are for cached info is just what py_stress defined when it
> made the keyspace it uses.  I didn't change anything in
> cassandra.yaml, but I did change to use 2G of RAM
> rather than 1G.  I then ran "python -o insert -n 1000000000"
> (that's one billion).  I left for a day, and when I came back
> cassandra had run out of RAM, and had crashed at somewhere
> around 120,000,000 inserts.  This brings up a few questions:
> - is Cassandra's RAM use proportional to the number of values that
> it's storing?  I know that it uses bloom filters for preventing
> lookups of non-existent keys, but since bloom filters are designed to
> give an accuracy/space tradeoff, Cassandra should sacrifice accuracy
> in order to prevent crashes, if it's just bloom filters that are using
> all the RAM
> - When I start Cassandra again, it appears to go into an eternal
> read/write loop, using between 45% and 90% of my CPU.  It says it's
> compacting tables, but it's been doing that for hours, and it only has
> 70GB of data stored.  How can cassandra be run on huge datasets, when
> 70GB appears to take forever to compact?
> I assume I'm doing something wrong, but I don't see a ton of tunables
> to play with.  Can anybody give me advice on how to make cassandra
> keep running under a high insert load?

View raw message