Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 75824 invoked from network); 20 Oct 2010 21:17:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Oct 2010 21:17:22 -0000 Received: (qmail 39200 invoked by uid 500); 20 Oct 2010 21:17:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39179 invoked by uid 500); 20 Oct 2010 21:17:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39171 invoked by uid 99); 20 Oct 2010 21:17:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 21:17:20 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-ew0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 21:17:15 +0000 Received: by ewy6 with SMTP id 6so2814082ewy.31 for ; Wed, 20 Oct 2010 14:16:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=9Jpzne+hXO4oEMYrrzp6NObers0LICg5flTj2dGxjik=; b=cQpMjlR7YFGJqgk0zu3plur3EMgoeRga7QcEL12qrX4V1Bwho4qsLxmCbAMEnF0tZ+ yvur8+tvcQ9D2egNuqQIk6Gd0gNrVI4TMKOMsTtJR2B9E2yTu5ADh+V2+AkBbxAS+uEv vq4iB/Orgou9jmmEgAuibaepSAi/ohlSrTMlU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=tA0fNvdKSNjOsplwtYOCoS9clDVYzCWwoQ5m6T6ymb5qjADIJ43HkfL8x/2S9d9HOD xq0FNVqWrhO2K1GjcedNhINg8hAl2B9dFkCxDqx1Cl89FhhB+9TNitJWLWPoqs4xMiJs xadegIRk9O1oX+IVvFUmZPxc6sNTG5w0cjVXo= MIME-Version: 1.0 Received: by 10.213.14.70 with SMTP id f6mr7224147eba.7.1287609413826; Wed, 20 Oct 2010 14:16:53 -0700 (PDT) Received: by 10.14.53.7 with HTTP; Wed, 20 Oct 2010 14:16:53 -0700 (PDT) In-Reply-To: References: Date: Wed, 20 Oct 2010 16:16:53 -0500 Message-ID: Subject: Re: Cassandra crashed - possible JMX threads leak From: Jonathan Ellis To: user Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable can you reproduce this by, say, running nodeprobe ring in a bash while loop= ? On Wed, Oct 20, 2010 at 3:09 PM, Bill Au wrote: > One of my Cassandra server crashed with the following: > > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 > CassandraDaemon.java (line 82) Uncaught exception in thread > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] > java.lang.OutOfMemoryError: unable to create new native thread > =A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.start0(Native Method) > =A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.start(Thread.java:597) > =A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingServi= ce.java:533) > > > I took threads dump in the JVM on all the other Cassandra severs in my > cluster.=A0 They all have thousand of threads looking like this: > > "JMX server connection timeout 183373" daemon prio=3D10 tid=3D0x00002aad2= 30db800 > nid=3D0x5cf6 in Object.wait() [0x00002aad7a316000] > =A0=A0 java.lang.Thread.State: TIMED_WAITING (on object monitor) > =A0=A0=A0=A0=A0=A0=A0 at java.lang.Object.wait(Native Method) > =A0=A0=A0=A0=A0=A0=A0 at > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCom= municatorAdmin.java:150) > =A0=A0=A0=A0=A0=A0=A0 - locked <0x00002aab056ccee0> (a [I) > =A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Thread.java:619) > > It seems to me that there is a JMX threads leak in Cassandra.=A0 NodeProb= e > creates a JMXConnector but never calls its close() method.=A0 I tried set= ting > jmx.remote.x.server.connection.timeout to 0 hoping that would disable the > JMX server connection timeout threads.=A0 But that did not make any > difference. > > Has anyone else seen this? > > Bill > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com