cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walsh, Stephen" <Stephen.Wa...@Aspect.com>
Subject RE: Cassandra shutdown during large number of compactions - now fails to start with OOM Exception
Date Thu, 17 Sep 2015 15:36:46 GMT
Some more info,

Looking at the Java Memory Dump file.

I see about 400 SSTableScanners  - one for each of our column Families.
Each is about 200MB in size.
And (from what I can see) all of them are reading from a "compactions_in_progress-ka-000000-Data.db"
file

dfile  org.apache.cassandra.io.compress.CompressedRandomAccessReader path = "/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-71661-Data.db"
131840 104

Steve


From: Walsh, Stephen
Sent: 17 September 2015 15:33
To: user@cassandra.apache.org
Subject: Cassandra shutdown during large number of compactions - now fails to start with OOM
Exception

Hey all, I was hoping someone had a similar issue.
We're using 2.1.6 and shutdown a testbed in AWS thinking we were finished with it,
We started it backup today and saw that only 2 of 4 nodes came up.

Seems there was a lot of compaction happening at the time it was shutdown, cassandra tries
to start-up and we get an OutOfMemory Exception.


INFO  13:45:57 Initializing system.range_xfers
INFO  13:45:57 Initializing system.schema_keyspaces
INFO  13:45:57 Opening /var/lib/cassandra/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-21807
(19418 bytes)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/cassandra/java_pid3011.hprof ...
Heap dump file created [7751760805 bytes in 52.439 secs]
ERROR 13:47:11 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space


it's not related the key_cache, we removed this and the issue is still present.
So we believe its re-trying all the compactions that were in place when it went down.

We've modified the HEAP size to be half of the systems RAM (8GB in this case)

At the moment the only work around we have is to empty the data / saved_cache / commit_log
folders and let it re-sync with the other nodes.

Has anyone seen this before and what have they done to solve it?
Can we remove unfinished compactions?

Steve



This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its
attachments.

Mime
View raw message