incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <>
Subject Re: Cassandra crashed - possible JMX threads leak
Date Thu, 21 Oct 2010 05:49:57 GMT
Sounds like the problem discussed here|(memory)

If you have the JNA jar it should work


On 21 Oct, 2010,at 06:29 PM, Frank LoVecchio <> wrote:

I have a cluster of 3 0.7 beta 2 nodes (built today from the latest trunk) running on Large,
EBS-backed, x64 EC2 instances; RF=3.  I attempted to write somewhere near 500,000 records
every 15 minutes from a total of 5 different computers (using Pelops and multi-threading).
  Though my network blew up and I'm not quite sure how many records were inserted, I lost
a node a couple hours later, and the other 2 were at severely high memory useage.  Is this
a memory leak of some kind, or something I can configure / watch for in the future?

A nodetool does this:

[ec2-user@XXX bin]$ ./nodetool -h localhost ring
Address  Status State   Load         Token                            
ipXXX   Down   Normal  564.76 MB       XXX      
ipXXX   Up       Normal  564.83 MB       XXX     
ipXXX   Up       Normal  563.06 MB       XXX     

A top on the box that is down shows this: (dual core x64)

Cpu(s): 19.9%us,  5.9%sy,  0.0%ni,  8.8%id, 57.3%wa,  0.0%hi,  0.0%si,  8.1%st
Mem:   7651528k total,  7611112k used,    40416k free,    66056k buffers
Swap:        0k total,        0k used,        0k free,  3294076k cached

22514 root      20   0 5790m 4.0g 167m S    91.9        54.8 152:45.08 java    

I see this error in the log file:

ERROR [CompactionExecutor:1] 2010-10-21 01:35:05,318 (line 88)
Fatal exception in thread Thread[CompactionExecutor:1,1,main] Cannot run program "ln": error=12,
Cannot allocate memory
	at org.apache.cassandra.db.ColumnFamilyStore.snapshot(
	at org.apache.cassandra.db.Table.snapshot(
	at org.apache.cassandra.db.CompactionManager.doCompaction(
	at org.apache.cassandra.db.CompactionManager$
	at org.apache.cassandra.db.CompactionManager$
	at java.util.concurrent.FutureTask$Sync.innerRun(
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$
Caused by: Cannot run program "ln": error=12, Cannot
allocate memory
	at java.lang.ProcessBuilder.start(
	at org.apache.cassandra.db.ColumnFamilyStore.snapshot(
	... 9 more
Caused by: error=12, Cannot allocate memory
	at java.lang.UNIXProcess.<init>(
	at java.lang.ProcessImpl.start(
	at java.lang.ProcessBuilder.start(
	... 12 more

On Wed, Oct 20, 2010 at 3:16 PM, Jonathan Ellis <> wrote:
can you reproduce this by, say, running nodeprobe ring in a bash while loop?

On Wed, Oct 20, 2010 at 3:09 PM, Bill Au <> wrote:
> One of my Cassandra server crashed with the following:
> ERROR [] 2010-10-19 00:25:10,419
> (line 82) Uncaught exception in thread
> Thread[,5,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(
>         at
> I took threads dump in the JVM on all the other Cassandra severs in my
> cluster.  They all have thousand of threads looking like this:
> "JMX server connection timeout 183373" daemon prio=10 tid=0x00002aad230db800
> nid=0x5cf6 in Object.wait() [0x00002aad7a316000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at
> com.sun.jmx.remote.internal.ServerCommunicatorAdmin$
>         - locked <0x00002aab056ccee0> (a [I)
>         at
> It seems to me that there is a JMX threads leak in Cassandra.  NodeProbe
> creates a JMXConnector but never calls its close() method.  I tried setting
> jmx.remote.x.server.connection.timeout to 0 hoping that would disable the
> JMX server connection timeout threads.  But that did not make any
> difference.
> Has anyone else seen this?
> Bill

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message