From Dave Martin <>
Subject OutOfMemory on count on cassandra 0.6.8 for large number of columns
Date Sun, 12 Dec 2010 07:26:38 GMT
Hi there,

I see the following:

1) Add 8,000,000 columns to a single row. Each column name is a UUID.
2) Use cassandra-cli to run count['myGUID']

The following is reported in the logs:

ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 (line 87) Uncaught
exception in thread Thread[DroppedMessagesLogger,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 (line 1407) Internal error
processing get_count
java.lang.OutOfMemoryError: Java heap space

and Cassandra falls over. I see the same behaviour with 0.6.6.

Increasing the memory allocation with the -Xmx & -Xms args to 4GB allows the count to
return in this particular example (i.e. no OutOfMemory is thrown).

Here's the scala code that was ran to load the column, which uses the AKKA persistence API:

object ColumnTest {
	def main(args : Array[String]) : Unit = {
		println("Super column test starting")
		val hosts = Array{"localhost"}
		val sessions = new CassandraSessionPool("occurrence",StackPool(SocketProvider("localhost",
		val session = sessions.newSession
		loadRow("myGUID", 8000000, session)
	def loadRow(key:String, noOfColumns:Int, session:CassandraSession){
		print("loading: "+key+", with columns: "+noOfColumns)
		val start = System.currentTimeMillis
		val rawPath = new ColumnPath("dr")
		for(i <- 0 until noOfColumns){
			val recordUuid = UUID.randomUUID.toString
			session ++| (key, rawPath.setColumn(recordUuid.getBytes), "1".getBytes, System.currentTimeMillis)
		val finish = System.currentTimeMillis
		print(", Time taken (secs) :" +((finish-start)/1000) + " seconds.\n")

Heres the configuration used:

# Arguments to pass to the JVM
        -ea \
        -Xms1G \
        -Xmx2G \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:SurvivorRatio=8 \
        -XX:MaxTenuringThreshold=1 \
        -XX:CMSInitiatingOccupancyFraction=75 \
        -XX:+UseCMSInitiatingOccupancyOnly \
        -XX:+HeapDumpOnOutOfMemoryError \ \ \"

Admittedly the resource allocation is small, but I wondered if there should be some configuration
guidelines (e.g. memory allocation vs number of columns supported).
Im running this on my MBP with a single node and java as thus:

$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
Heres the CF definition:

    <Keyspace Name="occurrence">
      <ColumnFamily Name="dr"
                    Comment="The column family for dataset tracking"/>
Apologies in advance if this is a known issue or a known limitation of 0.6.x.
I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but 8mill columns =
300MB approx in this particular case.   
I guess it may also be a result of the limitations with thrift (i.e. no streaming capabilities).
Any thoughts appreciated,


