cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Günter Ladwig <>
Subject High GC activity and OOMs
Date Wed, 09 Nov 2011 09:30:32 GMT

I have a 15-node cluster where each node has 4GB RAM and 80GB disk. There are three CFs, of
which only two contain data. In total, each CF contains about 2 billion columns. I have a
replication factor of 2. All CFs are compressed with SnappyCompressor. This is on Cassandra

I was running some read tests and two of the nodes always seemed to fail inside a minute with
OOMs when I used 4-8 threads to perform the reads. One of the nodes is a replica of the other,
which is probably why they always fail at the same time. 

The OOMs look like this:

ERROR 19:44:27,163 Fatal exception in thread Thread[ReadStage:83,5,main]
java.lang.OutOfMemoryError: Java heap space
	at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(
	at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(
	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(
	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(
	at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(
	at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(
	at org.apache.cassandra.db.CollationController.collectAllData(
	at org.apache.cassandra.db.CollationController.getTopLevelColumns(
	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(
	at org.apache.cassandra.db.Table.getRow(
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
	at java.util.concurrent.ThreadPoolExecutor$

I did some investigating, and can now reproduce this by paging a single row that is stored
on these nodes. I'm reading just 1000 columns for each page, which easily fits in RAM (the
column values are actually empty, and the column names are less than 1k). However, this row
is very large (I noticed it while scrubbing). Here is the output from cfstats:

		Column Family: OSP
		SSTable count: 4
		Space used (live): 21954219574
		Space used (total): 21954219574
		Number of Keys (estimate): 85496192
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 0
		Read Latency: NaN ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache: disabled
		Row cache: disabled
		Compacted row minimum size: 125
		Compacted row maximum size: 36904729268
		Compacted row mean size: 10622

(I'm guessing the maximum row size is larger than space used because of the compression.)

While I don't see OOMs when I use only a single thread to page the row, there are lots of
ParNew collections that take about 500ms each and also many full collections.

Do I just not have enough RAM?

Dipl.-Inform. Günter Ladwig

Karlsruhe Institute of Technology (KIT)
Institute AIFB

Englerstraße 11 (Building 11.40, Room 250)
76131 Karlsruhe, Germany
Phone: +49 721 608-47946

KIT – University of the State of Baden-Württemberg and National Large-scale Research Center
of the Helmholtz Association

View raw message