cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay
Date Tue, 29 Jul 2014 19:02:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078171#comment-14078171
] 

Vishal Mehta edited comment on CASSANDRA-6666 at 7/29/14 7:01 PM:
------------------------------------------------------------------

Hello Everyone,

Please pardon my ignorance, since I am writing first time in opensource bug report.

Recently I think I hit this bug because I saw similar symptoms in my 3 node cassandra setup.
Where I am running a test with around 12K qps (inserts in 3 different tables) with TTL set
to 1 hour and keyspace has GC seconds set to 14400 (4 hours).

So tests eventually runs to a point where Cassandra sees Tombstones more than 100K and it
crashes with following exception in /var/log/cassandra/cassandra.log.

{noformat}
ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query aborted (see tombstone_fail_threshold)
ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
        at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
        at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
        at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373)
        at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330)
        at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91)
        at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
 INFO 13:24:00,987 No gossip backlog; proceeding
{noformat}

*Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one of the node
deleted all the records from disk and freed up the space, where as other two nodes never deleted
their tombstones.

Please advise.
Regards,
Vishal



was (Author: vmehta):
Hello Every,

Please pardon my ignorance, since I am writing first time in opensource bug report.

Recently I think I hit this bug because I saw similar symptoms in my 3 node cassandra setup.
Where I am running a test with around 12K qps (inserts in 3 different tables) with TTL set
to 1 hour and keyspace has GC seconds set to 14400 (4 hours).

So tests eventually runs to a point where Cassandra sees Tombstones more than 100K and it
crashes with following exception in /var/log/cassandra/cassandra.log.

{noformat}
ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query aborted (see tombstone_fail_threshold)
ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
        at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
        at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
        at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373)
        at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330)
        at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91)
        at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
 INFO 13:24:00,987 No gossip backlog; proceeding
{noformat}

*Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one of the node
deleted all the records from disk and freed up the space, where as other two nodes never deleted
their tombstones.



> Avoid accumulating tombstones after partial hint replay
> -------------------------------------------------------
>
>                 Key: CASSANDRA-6666
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6666
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: hintedhandoff
>             Fix For: 2.0.10
>
>         Attachments: 6666.txt, cassandra_system.log.debug.gz
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message