Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8802B10BFE for ; Wed, 8 Jan 2014 18:25:23 +0000 (UTC) Received: (qmail 27705 invoked by uid 500); 8 Jan 2014 18:25:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 27612 invoked by uid 500); 8 Jan 2014 18:25:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27430 invoked by uid 99); 8 Jan 2014 18:25:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jan 2014 18:25:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tyler@datastax.com designates 209.85.215.54 as permitted sender) Received: from [209.85.215.54] (HELO mail-la0-f54.google.com) (209.85.215.54) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jan 2014 18:24:56 +0000 Received: by mail-la0-f54.google.com with SMTP id b8so1442745lan.27 for ; Wed, 08 Jan 2014 10:24:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=3tnsdBFAZonnsfjN8bpS/66dpfNi9sJtudtViwaVJ8c=; b=bbX8X6FESqUNHOFB9u35VRvK2gzl/kdX5CCtoWog3Ufdv+rVSFWjBes0BfKAVi7yb/ +RLq8qnFYc1ki+kpVfh0rZy8gUxNo1RbhNeLDhNUu5nB76PD+QY/Zf4f5Orzn7f/thzP tLQKXHHpVLm3dUttdwVpd2rEk0Noc8S7L6aQ4hSyuAZ0cAk1qXo9uipzrWKjqF5OsKlW KfjaI+X696VO2332C8INlwlsBmwdIwLp0X5iHcsNMWZdkVSUtzCwTbuetZU7IOu9oSMl ss0ivtsdiBRrdUdWcFZJqtFVyo0SQ7yuuKSW/UVndGF6bMbRq9HpTCpgJnuKnnR9vzUB fMjg== X-Gm-Message-State: ALoCoQmeqH2gCfuHmBRtwYToomcVaQ+r/VoqN2cVpyb4QyUoeT4KNOgrYhkSxWHGqJOxTmzYphsr X-Received: by 10.152.180.66 with SMTP id dm2mr276495lac.88.1389205475337; Wed, 08 Jan 2014 10:24:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.135.230 with HTTP; Wed, 8 Jan 2014 10:24:15 -0800 (PST) In-Reply-To: References: From: Tyler Hobbs Date: Wed, 8 Jan 2014 12:24:15 -0600 Message-ID: Subject: Re: OOM after some days related to RunnableScheduledFuture and meter persistance To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11345896e1aada04ef799a14 X-Virus-Checked: Checked by ClamAV on apache.org --001a11345896e1aada04ef799a14 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I believe this is https://issues.apache.org/jira/browse/CASSANDRA-6358, which was fixed in 2.0.3. On Wed, Jan 8, 2014 at 7:15 AM, Desimpel, Ignace wrote: > Hi, > > > > On linux and cassandra version 2.0.2 I had an OOM after a heavy load and > then some (15 ) days of idle running (not exactly idle but very very low > activity). > > Two out of a 4 machine cluster had this OOM. > > > > I checked the heap dump (9GB) and that tells me : > > > > One instance of *"java.util.concurrent.ScheduledThreadPoolExecutor"*loade= d by *" class loader>"* occupies *8.927.175.368 (94,53%)* bytes. The instance is > referenced by *org.apache.cassandra.io.sstable.SSTableReader @ > 0x7fadf89e0* , loaded by *"sun.misc.Launcher$AppClassLoader @ > 0x683e6ad30"*. The memory is accumulated in one instance of > *"java.util.concurrent.RunnableScheduledFuture[]"* loaded by *" class loader>"*. > > > > So I checked the SSTableReader instance and found out the > =91ScheduledThreadPoolExecutor syncExecutor =91 object is holding about 6= 00k of > ScheduledFutureTasks. > > According to the code on SSTableReader these tasks must have been created > by the code line syncExecutor.scheduleAtFixedRate. That means that none o= f > these tasks ever get scheduled because some (and only one) initial task i= s > probably blocking. > > But then again, the one thread to execute these tasks, seems to be in a > =91normal=92 state (at time of OOM) and is executing with a stack trace p= asted > below : > > > > Thread 0x696777eb8 > > at > org.apache.cassandra.db.AtomicSortedColumns$1.create(Lorg/apache/cassandr= a/config/CFMetaData;Z)Lorg/apache/cassandra/db/AtomicSortedColumns; > (AtomicSortedColumns.java:58) > > at > org.apache.cassandra.db.AtomicSortedColumns$1.create(Lorg/apache/cassandr= a/config/CFMetaData;Z)Lorg/apache/cassandra/db/ColumnFamily; > (AtomicSortedColumns.java:55) > > at > org.apache.cassandra.db.ColumnFamily.cloneMeShallow(Lorg/apache/cassandra= /db/ColumnFamily$Factory;Z)Lorg/apache/cassandra/db/ColumnFamily; > (ColumnFamily.java:70) > > at > org.apache.cassandra.db.Memtable.resolve(Lorg/apache/cassandra/db/Decorat= edKey;Lorg/apache/cassandra/db/ColumnFamily;Lorg/apache/cassandra/db/index/= SecondaryIndexManager$Updater;)V > (Memtable.java:187) > > at > org.apache.cassandra.db.Memtable.put(Lorg/apache/cassandra/db/DecoratedKe= y;Lorg/apache/cassandra/db/ColumnFamily;Lorg/apache/cassandra/db/index/Seco= ndaryIndexManager$Updater;)V > (Memtable.java:158) > > at > org.apache.cassandra.db.ColumnFamilyStore.apply(Lorg/apache/cassandra/db/= DecoratedKey;Lorg/apache/cassandra/db/ColumnFamily;Lorg/apache/cassandra/db= /index/SecondaryIndexManager$Updater;)V > (ColumnFamilyStore.java:840) > > at > org.apache.cassandra.db.Keyspace.apply(Lorg/apache/cassandra/db/RowMutati= on;ZZ)V > (Keyspace.java:373) > > at > org.apache.cassandra.db.Keyspace.apply(Lorg/apache/cassandra/db/RowMutati= on;Z)V > (Keyspace.java:338) > > at org.apache.cassandra.db.RowMutation.apply()V (RowMutation.java:201) > > at > org.apache.cassandra.cql3.statements.ModificationStatement.executeInterna= l(Lorg/apache/cassandra/service/QueryState;)Lorg/apache/cassandra/transport= /messages/ResultMessage; > (ModificationStatement.java:477) > > at > org.apache.cassandra.cql3.QueryProcessor.processInternal(Ljava/lang/Strin= g;)Lorg/apache/cassandra/cql3/UntypedResultSet; > (QueryProcessor.java:178) > > at > org.apache.cassandra.db.SystemKeyspace.persistSSTableReadMeter(Ljava/lang= /String;Ljava/lang/String;ILorg/apache/cassandra/metrics/RestorableMeter;)V > (SystemKeyspace.java:938) > > at org.apache.cassandra.io.sstable.SSTableReader$2.run()V > (SSTableReader.java:342) > > at > java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; > (Executors.java:471) > > at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:304) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acce= ss$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTas= k;)Z > (ScheduledThreadPoolExecutor.java:178) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(= )V > (ScheduledThreadPoolExecutor.java:293) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/T= hreadPoolExecutor$Worker;)V > (ThreadPoolExecutor.java:1145) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run()V > (ThreadPoolExecutor.java:615) > > at java.lang.Thread.run()V (Thread.java:724) > > > > > > Since each of these tasks are throttled by meterSyncThrottle.acquire() I > suspect that the RateLimiter is causing a delay. The RateLimiter instance > attributes are : > > Type|Name |Value > > long|nextFreeTicketMicros|3016022567383 > > double|maxPermits|100.0 > > double|storedPermits|99.0 > > long|offsetNanos|334676357831746 > > > > I guess that these attributes will practically result in a blocking > behavior, resulting in the OOM =85 > > > > Is there someone that can make sense out of it? > > I hope this helps in finding out what the reason is for this and maybe > could be avoided in the future. I still have the heap dump, so I can alwa= ys > pass more information if needed. > > > > Regards, > > > > Ignace Desimpel > --=20 Tyler Hobbs DataStax --001a11345896e1aada04ef799a14 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
I believe this is https://issues.apache.org/jira/browse/CASSANDRA-63= 58, which was fixed in 2.0.3.

<= br>
On Wed, Jan 8, 2014 at 7:15 AM, Desimpel, Ignace= <Ignace.Desimpel@nuance.com> wrote:

Hi,

=A0

On linux and cassandra version 2.0.2 I had an OOM af= ter a heavy load and then some (15 ) days of idle running (not exactly idle= but very very low activity).

Two out of a 4 machine cluster had this OOM.<= u>

=A0

I checked the heap dump (9GB) and that tells me :=

=A0

One instance of "java.util.concurrent.ScheduledThreadPoolExecutor" loaded = by "<system class loader>" occupies 8.927.175.368 (94,53%) bytes. The instance is referenced by org.a= pache.cassandra.io.sstable.SSTableReader @ 0x7fadf89e0=A0, loaded by "sun.misc.Launcher$AppClassLoader @ 0x683e6ad30". The memo= ry is accumulated in one instance of "java.util.concurrent.RunnableScheduledFuture[]" loaded by= "<system class loader>".

=A0

So I checked the SSTableReader instance and found ou= t the =91ScheduledThreadPoolExecutor syncExecutor =91 object is holding abo= ut 600k of ScheduledFutureTasks.

According to the code on SSTableReader these tasks m= ust have been created by the code line syncExecutor.scheduleAtFixedRate. Th= at means that none of these tasks ever get scheduled because some (and only= one) initial task is probably blocking.

But then again, the one thread to execute these task= s, seems to be in a =91normal=92 state (at time of OOM) and is executing wi= th a stack trace pasted below :

=A0

Thread 0x696777eb8

=A0 at org.apache.cassandra.db.AtomicSortedColumns$1= .create(Lorg/apache/cassandra/config/CFMetaData;Z)Lorg/apache/cassandra/db/= AtomicSortedColumns; (AtomicSortedColumns.java:58)

=A0 at org.apache.cassandra.db.AtomicSortedColumns$1= .create(Lorg/apache/cassandra/config/CFMetaData;Z)Lorg/apache/cassandra/db/= ColumnFamily; (AtomicSortedColumns.java:55)

=A0 at org.apache.cassandra.db.ColumnFamily.cloneMeS= hallow(Lorg/apache/cassandra/db/ColumnFamily$Factory;Z)Lorg/apache/cassandr= a/db/ColumnFamily; (ColumnFamily.java:70)

=A0 at org.apache.cassandra.db.Memtable.resolve(Lorg= /apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamily;Lor= g/apache/cassandra/db/index/SecondaryIndexManager$Updater;)V (Memtable.java= :187)

=A0 at org.apache.cassandra.db.Memtable.put(Lorg/apa= che/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamily;Lorg/ap= ache/cassandra/db/index/SecondaryIndexManager$Updater;)V (Memtable.java:158= )

=A0 at org.apache.cassandra.db.ColumnFamilyStore.app= ly(Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFam= ily;Lorg/apache/cassandra/db/index/SecondaryIndexManager$Updater;)V (Column= FamilyStore.java:840)

=A0 at org.apache.cassandra.db.Keyspace.apply(Lorg/a= pache/cassandra/db/RowMutation;ZZ)V (Keyspace.java:373)

=A0 at org.apache.cassandra.db.Keyspace.apply(Lorg/a= pache/cassandra/db/RowMutation;Z)V (Keyspace.java:338)

=A0 at org.apache.cassandra.db.RowMutation.apply()V = (RowMutation.java:201)

=A0 at org.apache.cassandra.cql3.statements.Modifica= tionStatement.executeInternal(Lorg/apache/cassandra/service/QueryState;)Lor= g/apache/cassandra/transport/messages/ResultMessage; (ModificationStatement= .java:477)

=A0 at org.apache.cassandra.cql3.QueryProcessor.proc= essInternal(Ljava/lang/String;)Lorg/apache/cassandra/cql3/UntypedResultSet;= (QueryProcessor.java:178)

=A0 at org.apache.cassandra.db.SystemKeyspace.persis= tSSTableReadMeter(Ljava/lang/String;Ljava/lang/String;ILorg/apache/cassandr= a/metrics/RestorableMeter;)V (SystemKeyspace.java:938)

=A0 at org.apache.cassandra.io.sstable.SSTableReader= $2.run()V (SSTableReader.java:342)

=A0 at java.util.concurrent.Executors$RunnableAdapte= r.call()Ljava/lang/Object; (Executors.java:471)

=A0 at java.util.concurrent.FutureTask.runAndReset()= Z (FutureTask.java:304)

=A0 at java.util.concurrent.ScheduledThreadPoolExecu= tor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoo= lExecutor$ScheduledFutureTask;)Z (ScheduledThreadPoolExecutor.java:178)<= /u>

=A0 at java.util.concurrent.ScheduledThreadPoolExecu= tor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:293)

=A0 at java.util.concurrent.ThreadPoolExecutor.runWo= rker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor= .java:1145)

=A0 at java.util.concurrent.ThreadPoolExecutor$Worke= r.run()V (ThreadPoolExecutor.java:615)

=A0 at java.lang.Thread.run()V (Thread.java:724)<= /u>

=A0

=A0

Since each of these tasks are throttled by meterSync= Throttle.acquire() I suspect that the RateLimiter is causing a delay. The R= ateLimiter instance attributes are :

Type|Name=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 |Value

long|nextFreeTicketMicros|3016022567383

double|maxPermits|100.0

double|storedPermits|99.0

long|offsetNanos|334676357831746

=A0

I guess that these attributes will practically resul= t in a blocking behavior, resulting in the OOM =85

=A0

Is there someone that can make sense out of it?

I hope this helps in finding out what the reason is = for this and maybe could be avoided in the future. I still have the heap du= mp, so I can always pass more information if needed.

=A0

Regards,

=A0

Ignace Desimpel




--
Tyler Hobbs
DataStax
--001a11345896e1aada04ef799a14--