cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Gunderson" <rgunder...@ca.ibm.com>
Subject Re: tombstone_failure_threshold being ignored?
Date Tue, 03 May 2016 19:03:57 GMT
I would have thought that a RangeSliceReply (which is the parent object 
that seems to "own" the ArrayList) would have contained only those objects 
related to the corresponding query. The hierarchy of objects appears to 
be:

org.apache.cassandra.net.OutboundTcpConnection$QueuedMessage
    org.apache.cassandra.net.MessageOut
        org.apache.cassandra.db.RangeSliceReply
            java.util.ArrayList
                java.lang.Object[1823230]
                    org.apache.cassandra.db.Row
                    ....

So to me it looks like all the Row objects are related to one outbound 
message (assuming my interpretation of the heap dump is correct).

And regarding the tombstone_warn_threshold (which is the default of 1000), 
we never see those warnings in the logs either (with the exception of the 
<file> and <fileNamePattern> settings, we are using the out of the box 
logback.xml settings).

Oleksandr Petrov <oleksandr.petrov@gmail.com> wrote on 05/03/2016 01:21:20 
AM:

> From: Oleksandr Petrov <oleksandr.petrov@gmail.com>
> To: user@cassandra.apache.org
> Date: 05/03/2016 01:21 AM
> Subject: Re: tombstone_failure_threshold being ignored?
> 
> If I understand the problem correctly, tombstone_failure_theshold is
> never reached because the ~2M objects might have been collected for 
> different queries running in parallel, not for one query. Every 
> separate query never reached the threshold although all together 
> they contributed to the OOM.
> 
> You can read a bit more about the anti-patterns (particularly, ones 
> related to workloads generating lots of tombstones): http://
> www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-
> like-datasets
> 
> You can also try running more frequent repair/compacts. Although I'd
> look closer on the read queries first, possibly with tracing on, and
> check parallelism for those. Maybe decrease warn level for tombstone
> thresholds to understand where the bounds are.
> 
> On Thu, Apr 28, 2016 at 7:23 PM Rick Gunderson <rgunderson@ca.ibm.com> 
wrote:
> We are running Cassandra 2.2.3, 2 data centers, 3 nodes in each. The
> replication factor per datacenter is 3. The Xmx setting on the 
> Cassandra JVMs is 4GB.
> 
> We have a workload that generates loots of tombstones and Cassandra 
> goes OOM in about 24 hours. We've adjusted the 
> tombstone_failure_threshold down to 25000 but we never see the 
> TombstoneOverwhelmingException before the nodes start going OOM.
> 
> The table operation that looks to be the culprit is a scan of 
> partition keys (i.e. we are scanning across narrow rows, not 
> scanning within a wide row). The heapdump shows we have a 
> RangeSliceReply containing an ArrayList with 1,823,230 
> org.apache.cassandra.db.Row objects with a retained heap size of 
> 441MiB.  A look inside one of the Row objects shows an 
> org.apache.cassandra.db.DeletionInfo object so I assume that means 
> the row has been tombstoned.
> 
> If all of the 1,823,239 Row objects are tombstoned (and it is likely
> that most of them are), is there a reason that the 
> TombstoneOverwhelmingException never gets thrown? 
> 
> 
> 
> Regards,
> 
> Rick (R.) Gunderson 
> Software Engineer
> IBM Commerce, B2B Development - GDHA
> 
> Phone: 1-250-220-1053 
> E-mail: rgunderson@ca.ibm.com
> Find me on:  
> 
> 
> 
> 1803 Douglas St
> Victoria, BC V8T 5C3 
> Canada 
> 
> 

> -- 
> Alex [attachment "attzt2ii.jpg" deleted by Rick Gunderson/CanWest/
> IBM] [attachment "att9c0x1.jpg" deleted by Rick Gunderson/CanWest/
> IBM] [attachment "attygeo3.gif" deleted by Rick Gunderson/CanWest/
> IBM] [attachment "att7ryu3.jpg" deleted by Rick Gunderson/CanWest/
> IBM] [attachment "attea19n.gif" deleted by Rick Gunderson/CanWest/
> IBM] [attachment "att95hh3.jpg" deleted by Rick Gunderson/CanWest/IBM] 


Mime
View raw message