cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Reddy <>
Subject Re: clarification on 100k tombstone limit in indexes
Date Sun, 10 Aug 2014 20:49:54 GMT
Hi Ian

Are these tombstones ever "GCed" out of the index?  How frequently?

Yes, tombstones are removed after the time specified by gc_grace_seconds
has elapsed, which by default is 10 days and is configurable. Knowing and
understanding how Cassandra handles distributed deletes is key to designing
an efficient schema if you plan to delete often. There are lots of
resources on how deletes are handled in Cassandra, take a look at these
links for example:


On Sun, Aug 10, 2014 at 9:02 PM, Ian Rose <> wrote:

> Hi Mark -
> Thanks for the clarification but as I'm not too familiar with the nuts &
> bolts of Cassandra I'm not sure how to apply that info to my current
> situation.  It sounds like this 100k limit is, indeed, a "global" limit as
> opposed to a per-row limit.  Are these tombstones ever "GCed" out of the
> index?  How frequently?  If not, then it seems like *any* index is at risk
> of reaching this tipping point; it's just that indexes on frequently
> updated columns will reach this pointer faster the indexes on rarely
> updated columns.
> Basically I'm trying to get some kind of sense for what "frequently
> updated
> <>"
> means quantitatively.  As written, the docs make it sound dangerous to
> create an index on a column that is *ever* deleted or updated since there
> is no sense of how frequent is "too frequent".
> Cheers,
> Ian
> On Sun, Aug 10, 2014 at 3:02 PM, Mark Reddy <>
> wrote:
>> Hi Ian,
>> The issues here, which relates to normal and index column families, is
>> scanning over a large number of tombstones can cause Cassandra to fall over
>> due to increased GC pressure. This pressure is caused because tombstones
>> will create DeletedColumn objects which consume heap. Also
>> these DeletedColumn objects will have to be serialized and sent back to the
>> coordinator, thus increasing your response times. Take for example a row
>> that does deletes and you query it with a limit of 100. In a worst case
>> scenario you could end up reading say 50k tombstones to reach the 100
>> 'live' column limit, all of which has to be put on heap and then sent over
>> the wire to the coordinator. This would be considered a Cassandra
>> anti-pattern.[1]
>> With that in mind there was a debug warning added to 1.2 to inform the
>> user when they were querying a row with 1000 tombstones [2]. Then in 2.0
>> the action was taken to drop requests reaching 100k tombstones[3] rather
>> than just printing out a warning. This is a safety measure, as it is not
>> advised to perform such a query and is a result of most people 'doing it
>> wrong'.
>> For those people who understand the risk of scanning over large numbers
>> of tombstones there is a configuration option in the cassandra.yaml to
>> increase this threshold, tombstone_failure_threshold.[4]
>> Mark
>> [1]
>> [2]
>> [3]
>> [4]
>> On Sun, Aug 10, 2014 at 7:19 PM, Ian Rose <> wrote:
>>> Hi -
>>> On this page (
>>> the docs state:
>>> Do not use an index [...] On a frequently updated or deleted column
>>> and
>>>> *Problems using an index on a frequently updated or deleted column*ΒΆ
>>>> <>
>>> Cassandra stores tombstones in the index until the tombstone limit
>>>> reaches 100K cells. After exceeding the tombstone limit, the query that
>>>> uses the indexed value will fail.
>>> I'm afraid I don't really understand this limit from its (brief)
>>> description.  I also saw this recent thread
>>> <>
>>> I'm afraid it didn't help me much...
>>> If I have tens or hundreds of thousands of rows in a keyspace, where
>>> every row has an indexed column that is updated O(10) times during the
>>> lifetime of each row, is that going to cause problems for me?  If that 100k
>>> limit is *per row* then I should be fine but if that 100k limit is *per
>>> keyspace* then I'd definitely exceed it quickly.
>>> In our system, items are created at a rate of ~10/sec.  Each item is
>>> updated ~10 times over the next few minutes (although in rare cases the
>>> number of updates, and the duration, might be several times as long).  Once
>>> the last update is received for an item, we select it from Cassandra,
>>> process the data, then delete the entire row.
>>> The tricky bit is that sometimes (maybe 30-40% of the time) we don't
>>> actually know when the last update has been received so we use a timeout:
>>> if an item hasn't been updated for 30 minutes, then we assume it is done
>>> and should process it as before (select, then delete).  So I am trying to
>>> design a schema that will allow for efficient queries of the form "find me
>>> all items that have not been updated in the past 30 minutes."  We plan to
>>> call this query once a minute.
>>> Here is my tentative schema:
>>> CREATE TABLE items (
>>>   item_id ascii,
>>>   last_updated timestamp,
>>>   item_data list<blob>,
>>>   PRIMARY KEY (item_id)
>>> )
>>> plus an index on last_updated.
>>> So updates to an existing item would just be "lookup by item_id, append
>>> new data to item_data, and set last_updated to now".  And queries to find
>>> items that have timed out would use the index on last_updated: "find all
>>> items where last_updated < [now - 30 minutes]".
>>> Assuming, that is, that the aforementioned 100k tombstone limit won't
>>> bring this index crashing to a halt...
>>> Any clarification on this limit and/or suggestions on a better way to
>>> model/implement this system would be greatly appreciated!
>>> Cheers,
>>> Ian

View raw message