Thanks for you reply. we will try both of your recommentation. The OS memory is 8G, For JVM Heap it is 2G, DeletedColumn used 1.4G which are rooted from readStage thread. Do you think we need increase the size of JVM Heap? 

 Configuration for the index columnFamily is

create column family purge
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and gc_grace = 1800
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';


Best Regards!

Jian Jin



2013/3/9 aaron morton <aaron@thelastpickle.com>
You need to provide some details of the machine and the JVM configuration. But lets say you need to have 4Gb to 8GB for the JVM heap. 

If you have many deleted columns I would say you have a *lot* of garbage in each row. Consider reducing the gc_grace seconds so the columns are purged more frequently, not however that columns are only purged when all fragments of the row are part of the minor compaction. 

If you have a mixed write / delete work load consider using the Levelled compaction strategy http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton

On 6/03/2013, at 10:37 PM, Jason Wee <peichieh@gmail.com> wrote:

hmm.. did you managed to take a look using nodetool tpstats? That may give you indication further..

Jason


On Thu, Mar 7, 2013 at 1:56 PM, 金剑 <jinjian.1@gmail.com> wrote:
Hi,

My version is  1.1.7

Our use case is : we have a index columnfamily to record how many resource is stored for a user. The number might vary from tens to millions.

We provide a feature to let user to delete resource according prefix.


 we found some cassandra will OOM after some period. The cluster is a kind of cross-datacenter ring.

1. Exception in cassandra log:

ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5810,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5819,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-36,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-3990,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ACCEPT-/10.139.50.62,5,main]
java.lang.RuntimeException: java.nio.channels.ClosedChannelException
at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710)
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:137)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:699)
 INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java (line 374) Timed out replaying hints to /23.20.84.240; aborting further deliveries
 INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint
 INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java (line 296) Started hinted handoff for token: 3

2. From heap dump, there are many deletedColumn found, rooted from thread readStage.


Pls help: where might be the problem?

Best Regards!

Jian Jin