cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Out of Memory Issues - SERIOUS
Date Fri, 08 Oct 2010 03:58:42 GMT
if you don't want to lose data, don't wipe your commit logs.  that
part seems pretty obvious to me. :)

cassandra aggressively logs its state when it is running out of memory
so you can troubleshoot.  look for the GCInspector lines in the log.

but in this case it sounds pretty simple; you will be able to finish
replaying the commitlogs if you lower your memtable thresholds or
alternatively increase the amount of memory given to the JVM.  (see
http://wiki.apache.org/cassandra/MemtableSSTable.)

the _binary_ memtable setting has no effect on commitlog replay (it
has no effect on anything but binary writes through the storageproxy
api, which you are not using), you need to adjust
memtable_throughput_in_mb and memtable_operations_in_millions.

If you haven't explicitly set these then Cassandra will guess based on
your heap size; here, it is guessing too high.  start by uncommenting
the settings in the .yaml and reduce by 50% until it works.
alternatively, apply the patch at
https://issues.apache.org/jira/browse/CASSANDRA-1595 to see what
Cassandra is guessing, and start at half of that.

On Thu, Oct 7, 2010 at 10:32 PM, Dan Hendry <dan@ec2.dustbunnytycoon.com> wrote:
> There seems to have been a fair amount of discussion on memory related
> issues so I apologize if this exact situation has come up before.
>
>
>
> I am currently in the process of load testing an metrics platform I have
> written which uses Cassandra and I have run into some very troubling issues.
> The application is writing quite heavily, about 1000-2000 updates (columns)
> per second using batch mutates of 20 columns each. This is divided between
> creating new rows and adding columns to a fairly limited number of existing
> index rows (<30). Nearly all of these updates are read within 10 seconds and
> none contain any significant amount of data (generally much less than 100
> bytes of data which I specify). Initially, the test hums along nicely but
> after some amount of time (1-2 hours) Cassandra crashes with an out of
> memory error. Unfortunately I have not had the opportunity to watch the test
> as it crashes, but it has happened in 2/2 tests.
>
>
>
> This is quite annoying but the absolutely TERRIFYING behaviour is that when
> I restart Cassandra, it starts replaying the commit logs then crashes with
> an out of memory error again. Restart a second time, crash with OOM; it
> seems to get through about 3/4 of the commit logs. Just to be absolutely
> explicit, I am not trying to insert or read at this point, just recover the
> previous updates. Unless somebody can suggest a way to recover the commit
> logs, I have effectively lost my data. The only way I have found to recover
> is wipe the data directories. It does not matter right now given that it is
> only a test but this behaviour is completely unacceptable for a production
> system.
>
>
>
> Here is information about the system which is probably relevant. Let me know
> if any additional details about my application would help sort out this
> issue:
>
> -          Cassandra 0.7 Beta2
>
> -          DB Machine: EC2 m1 large with the commit log directory on an ebs
> and the data directory on ephemeral storage.
>
> -          OS: Ubuntu server 10.04
>
> -          With the exception of changing JMX settings, no memory or JVM
> changes were made to options in cassandra-env.sh
>
> -          In Cassandra.yaml, I reduced binary_memtable_throughput_in_mb to
> 100 in my second test to try follow the heap memory calculation formula; I
> have 8 column families.
>
> -          I am using the Sun JVM, specifically “build 1.6.0_20-b02”
>
> -          The app is written in java and I am using the latest Pelops
> library, I am sending updates at consistency level ONE and reading them at
> level ALL.
>
>
>
> I have been fairly impressed with Cassandra overall and given that I am
> using a beta version, I don’t expect fully polished behaviour. What is
> unacceptable, and quite frankly nearly unbelievable, is the fact Cassandra
> cant seem to recover from the error and I am loosing data.
>
>
>
> Dan Hendry



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message