I started digging in the logfiles, and here's the full log of a crash. We have 4GB of heap, and if you watch either OpsCenter or through a JMX console, used heap size is always below 3GB, and it swings between 1.5 to 3GB ever 5 minutes or so. So why would it suddenly run out of heap space?


 INFO 14:08:50,078 Completed flushing /data-ssd/cassandra/data/system/HintsColumnFamily-hd-10-Data.db (5328100 bytes)
 INFO 14:08:50,079 4103 MUTATION messages dropped in last 5000ms
 INFO 14:08:50,080 Pool Name                    Active   Pending   Blocked
 INFO 14:08:50,086 Writing Memtable-A@1734568216(117729674/147162092 serialized/live bytes, 12097 ops)
 INFO 14:08:50,089 ReadStage                        32       363         0
 INFO 14:08:50,089 RequestResponseStage              0         0         0
 INFO 14:08:50,089 ReadRepairStage                   1         1         0
 INFO 14:08:50,089 MutationStage                    32      5672         0
 INFO 14:08:50,090 ReplicateOnWriteStage             0         0         0
 INFO 14:08:50,090 GossipStage                       0         0         0
 INFO 14:08:50,090 AntiEntropyStage                  0         0         0
 INFO 14:08:50,090 MigrationStage                    0         0         0
 INFO 14:08:50,090 StreamStage                       0         0         0
 INFO 14:08:50,091 MemtablePostFlusher               1         6         0
 INFO 14:08:50,091 FlushWriter                       1         5         0
 INFO 14:08:50,091 MiscStage                         0         0         0
 INFO 14:08:50,091 InternalResponseStage             0         0         0
 INFO 14:08:50,094 Compacting [SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-9-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-10-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-8-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-7-Data.db')]
 INFO 14:08:50,098 HintedHandoff                     0         0         0
 INFO 14:08:50,098 CompactionManager                 0         2
 INFO 14:08:50,099 MessagingService                n/a       0,1
 INFO 14:08:50,099 ColumnFamily                Memtable ops,data  Row cache size/cap  Key cache size/cap
 INFO 14:08:50,099 system.NodeIdInfo                         0,0                 0/0                 0/0
 INFO 14:08:50,099 system.IndexInfo                          0,0                 0/0                 0/0
 INFO 14:08:50,099 system.LocationInfo                       0,0                 0/0                 0/0
 INFO 14:08:50,100 system.Versions                           0,0                 0/0                 0/0
 INFO 14:08:50,107 system.Migrations                         0,0                 0/0                 0/0
 INFO 14:08:50,107 system.HintsColumnFamily                  0,0                 0/0                 0/0
 INFO 14:08:50,107 system.Schema                             0,0                 0/0                 0/0
 INFO 14:08:50,108 K.A                              920,12311092                 0/0                 0/0
 INFO 14:08:50,127 K.B                               111,1005040                 0/0           6000/6000
 INFO 14:08:50,127 K.C                                  13,23283                 0/0       211817/211817
 INFO 14:08:50,127 OpsCenter.rollups300                      0,0                 0/0                 4/4
 INFO 14:08:50,128 OpsCenter.rollups60              1876,1452889                 0/0               30/30
 INFO 14:08:50,128 OpsCenter.rollups7200                     0,0                 0/0               30/30
 INFO 14:08:50,137 Enqueuing flush of Memtable-rollups60@33961585(43148/1452889 serialized/live bytes, 1876 ops)
 INFO 14:08:50,160 OpsCenter.rollups86400                    0,0                 0/0                 0/0
 INFO 14:08:50,169 OpsCenter.events                          0,0                 0/0               13/13
 INFO 14:08:50,174 OpsCenter.events_timeline                 0,0                 0/0                 0/0
 INFO 14:08:50,175 OpsCenter.settings                        0,0                 0/0                 0/0
 INFO 14:08:50,175 OpsCenter.pdps                     470,103983                 0/0                 0/0
 INFO 14:09:09,860 GC for ConcurrentMarkSweep: 2560 ms for 6 collections, 3669729184 used; max is 4118806528
 WARN 14:09:09,860 Heap is 0.8909690608317886 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN 14:09:09,860 Flushing CFS(Keyspace='K', ColumnFamily='A') to relieve memory pressure
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/lib/cassandra/java_1346605112.hprof ...
Heap dump file created [5553296481 bytes in 67.950 secs]
 INFO 14:11:54,810 InetAddress /10.100.1.116 is now dead.
 INFO 14:11:54,813 InetAddress /10.100.1.117 is now dead.
 INFO 14:11:54,814 InetAddress /10.100.1.118 is now dead.
ERROR 14:11:54,830 Fatal exception in thread Thread[CompactionExecutor:565,1,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.io.util.FastByteArrayOutputStream.expand(FastByteArrayOutputStream.java:104)
        at org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:220)
        at java.io.DataOutputStream.write(Unknown Source)
        at org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:323)
        at org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:316)
        at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:63)
        at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:361)
        at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:334)
        at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:89)
        at org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:136)
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158)
        at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:135)
        at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:115)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
 INFO 14:11:54,839 InetAddress /10.100.1.116 is now UP
 INFO 14:11:54,849 Stop listening to thrift clients
 INFO 14:11:54,850 InetAddress /10.100.1.117 is now UP
 INFO 14:11:54,850 InetAddress /10.100.1.118 is now UP
 INFO 14:11:54,862 Waiting for messaging service to quiesce
 INFO 14:12:06,909 MessagingService shutting down server thread.
ERROR 14:12:06,912 Fatal exception in thread Thread[Thread-117,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
        at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)
ERROR 14:12:06,913 Fatal exception in thread Thread[Thread-116,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
        at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)
ERROR 14:12:19,466 Fatal exception in thread Thread[Thread-118,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
        at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)
 INFO 14:12:19,466 Started hinted handoff for token: 1 with IP: /10.100.1.116
 INFO 14:13:33,343 Completed flushing /data-ssd/cassandra/data/K/A-hd-279743-Data.db (43508009 bytes)
 INFO 14:13:33,346 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') (estimated 0 bytes)
 INFO 14:13:33,350 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='rollups86400') (estimated 0 bytes)
 INFO 14:13:33,351 Writing Memtable-B@797498287(674046/53923680 serialized/live bytes, 1388162 ops)
 INFO 14:13:33,349 Compacting [SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279741-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279235-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279743-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279506-Data.db')]
 INFO 14:13:33,351 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='events') (estimated 0 bytes)
 INFO 14:13:33,350 Enqueuing flush of Memtable-A@884602824(10126517/12658146 serialized/live bytes, 948 ops)
 INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='events_timeline') (estimated 0 bytes)
 INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='settings') (estimated 0 bytes)
 INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='pdps') (estimated 103983 bytes)
 INFO 14:14:50,122 Completed flushing /data-ssd/cassandra/data/K/B-hd-14662-Data.db (5340050 bytes)
 INFO 14:14:50,123 26225 MUTATION messages dropped in last 5000ms
 INFO 14:14:50,123 Writing Memtable-C@841365631(436486/8776427 serialized/live bytes, 8752 ops)
 INFO 14:14:50,123 9 READ_REPAIR messages dropped in last 5000ms
 INFO 14:14:50,131 1519 READ messages dropped in last 5000ms
 INFO 14:14:50,146 Enqueuing flush of Memtable-pdps@1698224970(39228/103983 serialized/live bytes, 470 ops)
 INFO 14:14:50,167 Pool Name                    Active   Pending   Blocked
 INFO 14:14:50,169 ReadStage                        20        20         0
 INFO 14:14:50,169 RequestResponseStage              0         0         0
 INFO 14:14:50,170 ReadRepairStage                   1         1         0
 INFO 14:14:50,170 ReplicateOnWriteStage             0         0         0
 INFO 14:14:50,171 GossipStage                       0         0         0
 INFO 14:14:50,171 AntiEntropyStage                  0         0         0
 INFO 14:14:50,171 MigrationStage                    0         0         0
 INFO 14:14:50,172 StreamStage                       0         0         0
 INFO 14:14:50,172 MemtablePostFlusher               1         5         0
 INFO 14:14:50,172 FlushWriter                       1         5         1
 INFO 14:14:50,173 MiscStage                         0         0         0
 INFO 14:14:50,173 InternalResponseStage             0         0         0
 INFO 14:14:50,174 HintedHandoff                     1         3         0
 INFO 14:14:50,174 CompactionManager                 1         4
 INFO 14:14:50,174 MessagingService                n/a       0,0
 INFO 14:14:50,174 ColumnFamily                Memtable ops,data  Row cache size/cap  Key cache size/cap
 INFO 14:14:50,194 system.NodeIdInfo                         0,0                 0/0                 0/0
 INFO 14:14:50,194 system.IndexInfo                          0,0                 0/0                 0/0
 INFO 14:14:50,194 system.LocationInfo                       0,0                 0/0                 0/0
 INFO 14:14:50,195 system.Versions                           0,0                 0/0                 0/0
 INFO 14:14:50,195 system.Migrations                         0,0                 0/0                 0/0
 INFO 14:14:50,195 system.HintsColumnFamily            13,964160                 0/0                 0/0
 INFO 14:14:50,195 system.Schema                             0,0                 0/0                 0/0
 INFO 14:14:50,195 K.A                                       0,0                 0/0                 0/0
 INFO 14:14:50,195 K.B                               115,1037040                 0/0           6000/6000
 INFO 14:14:50,196 K.C                                  13,23283                 0/0       211817/211817
 INFO 14:14:50,196 OpsCenter.rollups300                      0,0                 0/0                 4/4
 INFO 14:14:50,196 OpsCenter.rollups60                       0,0                 0/0               30/30
 INFO 14:14:50,196 OpsCenter.rollups7200                     0,0                 0/0               30/30
 INFO 14:14:50,196 OpsCenter.rollups86400                    0,0                 0/0                 0/0
 INFO 14:14:50,196 OpsCenter.events                          0,0                 0/0               13/13
 INFO 14:14:50,197 OpsCenter.events_timeline                 0,0                 0/0                 0/0
 INFO 14:14:50,197 OpsCenter.settings                        0,0                 0/0                 0/0
 INFO 14:14:50,197 OpsCenter.pdps                            0,0                 0/0                 0/0
Service exit with a return value of 100



On Tue, Sep 11, 2012 at 3:57 AM, aaron morton <aaron@thelastpickle.com> wrote:
It's impossible to start new connections, or impossible to send requests, or it just doesn't return anything when you've sent a request.
If it's totally frozen it sounds like GC. How long does it freeze for?

Despite that, we occasionally get OOM exceptions, and nodes crashing, maybe a few times per month. 
Do you have an error stack ?

We can't find anything in the cassandra logs indicating that something's up
Is it logging dropped messages or high TP pending ? Are the freezes associated with compaction or repair running?

 and occasionally we do bulk deletion of supercolumns in a row.
mmm, are you sending a batch mutation with lots-o-deletes ? Each row mutation (insert or delete) in the batch becomes a thread pool tasks. If you send 1,000 rows in a batch you will temporarily prevent other requests from being served.

The config options we are unsure about are things like commit log sizes, ….
I would try to find some indication of what's going on before tweaking. Have you checked iostat ?

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 11/09/2012, at 2:05 AM, Henrik Schröder <skrolle@gmail.com> wrote:

Hi all,

We're running a small Cassandra cluster (v1.0.10) serving data to our web application, and as our traffic grows, we're starting to see some weird issues. The biggest of these is that sometimes, a single node becomes unresponsive. It's impossible to start new connections, or impossible to send requests, or it just doesn't return anything when you've sent a request. Our client library is set to retry on another server when this happens, and what we see then is that the request is usually served instantly. So it's not the case that some requests are very difficult, it's that sometimes a node is just "busy", and we have no idea why or what it's doing.

We're using MRTG and Monit to monitor the servers, and in MRTG the average CPU usage is around 5%, on our quad-core Xeon servers with SSDs. But occassionally through Monit we can see that the 1-min load average goes above 4, and this usually corresponds to the above issues. Is this common? Does this happen to everyone else? And why the spikiness in load? We can't find anything in the cassandra logs indicating that something's up (such as a slow GC or compaction), and there's no corresponding traffic spike in the application either. Should we just add more nodes if any single one gets CPU spikes?

Another explanation could also be that we've configured it wrong. We're running pretty much default config. Each node has 16G of RAM, 4GB of heap, no row-cache and an sizeable key-cache. Despite that, we occasionally get OOM exceptions, and nodes crashing, maybe a few times per month. Should we increase heap size? Or move to 1.1 and enable off-heap caching?

We have quite a lot of traffic to the cluster. A single keyspace with two column families, RF=3, compression is enabled, and we're running the default size-tiered compaction.
Column family A has 60GB of actual data, each row has a single column, and that column holds binary data that varies in size up to 500kB. When we update a row, we write a new value to this single column, effectively replacing that entire row. We do ~1000 updates/s, totalling ~10MB/s in writes.
Column family B also has 60GB of actual data, but each row has a variable (~100-10000) number of supercolumns, and each supercolumn has a fixed number of columns with a fixed amount of data, totalling ~1kB. The operations we are doing on this column family is that we add supercolumns to rows with a rate of ~200/s, and occasionally we do bulk deletion of supercolumns in a row.

The config options we are unsure about are things like commit log sizes, memtable flushing thresholds, commit log syncing, compaction throughput, etc. Are we at a point with our data size and write load that the defaults aren't good enough anymore? Should we stick with size-tiered compaction, even though our application is update-heavy?


Many thanks,
/Henrik