incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Rose <ianr...@fullstory.com>
Subject Re: clarification on 100k tombstone limit in indexes
Date Mon, 11 Aug 2014 21:17:25 GMT
Hi DuyHai,

Thanks for the detailed response!  A few responses below:

"On a side node, your usage of secondary index is not the best one. Indeed,
indexing the update date will lead to a situation where for one date,
you'll mostly have one or a few matching items (assuming that the update
date resolution is small enough and update rate is not intense)." --> I
should have mentioned this original (slipped my mind) but to deal
specifically with this problem I had planned to use a timestamp with a
resolution of 1 minute (like your minute_bucket).  So that way each index
entry *will* have quite a few entries and the index as a whole won't grow
too big.  Is my thinking correct here?

"You better off create a manuel reverse-index to track modification date,
something like this"  --> I had considered an approach like this but my
concern is that for any given minute *all* of the updates will be handled
by a single node, right?  For example, if the minute_bucket is 2739 then
for that one minute, every single item update will flow to the node at
HASH(2739).  Assuming I am thinking about that right, that seemed like a
potential scaling bottleneck, which scared me off that approach.

Cheers,
Ian




On Sun, Aug 10, 2014 at 5:20 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> Hello Ian
>
> "It sounds like this 100k limit is, indeed, a "global" limit as opposed to
> a per-row limit" -->The threshold applies to each "REQUEST", not partition
> or globally.
>
> The threshold does not apply to a partition (physical row) simply because
> in one request you can fetch data from many partitions (multi get slice).
> There was a JIRA about this here:
> https://issues.apache.org/jira/browse/CASSANDRA-6865
>
> "Are these tombstones ever "GCed" out of the index?" --> Yes they are,
> during compactions of the index column family.
>
> "How frequently?" --> That's the real pain. Indeed you do not have any
> control on the tuning of secondary index CF compaction. As far as I know,
> the compaction settings (strategy, min/max thresholds...) inherits from the
> one of the base table
>
> Now, by looking very fast into your data model, it seems that you have a
> skinny partition patter. Since you mentioned that the date is updated only
> 10 times max, you should not run into the tombstonne threshold issue.
>
> On a side node, your usage of secondary index is not the best one. Indeed,
> indexing the update date will lead to a situation where for one date,
> you'll mostly have one or a few matching items (assuming that the update
> date resolution is small enough and update rate is not intense). It is the
> high-cardinality scenario to be avoided (
> http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html).
> Plus, the query on the index (find all items where last_updated < [now - 30
> minutes]) makes things worse since it is not an exact match but inequality.
>
>  You better off create a manuel reverse-index to track modification date,
> something like this:
>
> CREATE TABLE last_updated_item (
>     minute_bucket int, // format YYYYMMDDHHmm
>     last_update_date timestamp,
>     item_id ascii,
>     PRIMARY KEY(minute_bucket, last_update_date)
> );
>
>  The last_update_date column is quite self-explanatory. The minute_bucket
> is trickier. The idea is to split ranges on 30 minutes into buckets. 00:00
> to 00:30 is bucket 1, 00:30 to 01:00 is bucket 2 and so on. For a whole
> day, you'd have 48 buckets. We need to put data into buckets to avoid ultra
> wide rows since you mentioned that there are 10 items (so 10 updates) /
> sec. Of course, 30 mins is just an exemple, you can tune it down to a
> window of 5 minutes or 1 minute, depending on the insertion rate.
>
>
>
>
>
> On Sun, Aug 10, 2014 at 10:02 PM, Ian Rose <ianrose@fullstory.com> wrote:
>
>> Hi Mark -
>>
>> Thanks for the clarification but as I'm not too familiar with the nuts &
>> bolts of Cassandra I'm not sure how to apply that info to my current
>> situation.  It sounds like this 100k limit is, indeed, a "global" limit as
>> opposed to a per-row limit.  Are these tombstones ever "GCed" out of the
>> index?  How frequently?  If not, then it seems like *any* index is at risk
>> of reaching this tipping point; it's just that indexes on frequently
>> updated columns will reach this pointer faster the indexes on rarely
>> updated columns.
>>
>> Basically I'm trying to get some kind of sense for what "frequently
>> updated
>> <http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html>"
>> means quantitatively.  As written, the docs make it sound dangerous to
>> create an index on a column that is *ever* deleted or updated since
>> there is no sense of how frequent is "too frequent".
>>
>> Cheers,
>> Ian
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Aug 10, 2014 at 3:02 PM, Mark Reddy <mark.reddy@boxever.com>
>> wrote:
>>
>>> Hi Ian,
>>>
>>> The issues here, which relates to normal and index column families, is
>>> scanning over a large number of tombstones can cause Cassandra to fall over
>>> due to increased GC pressure. This pressure is caused because tombstones
>>> will create DeletedColumn objects which consume heap. Also
>>> these DeletedColumn objects will have to be serialized and sent back to the
>>> coordinator, thus increasing your response times. Take for example a row
>>> that does deletes and you query it with a limit of 100. In a worst case
>>> scenario you could end up reading say 50k tombstones to reach the 100
>>> 'live' column limit, all of which has to be put on heap and then sent over
>>> the wire to the coordinator. This would be considered a Cassandra
>>> anti-pattern.[1]
>>>
>>> With that in mind there was a debug warning added to 1.2 to inform the
>>> user when they were querying a row with 1000 tombstones [2]. Then in 2.0
>>> the action was taken to drop requests reaching 100k tombstones[3] rather
>>> than just printing out a warning. This is a safety measure, as it is not
>>> advised to perform such a query and is a result of most people 'doing it
>>> wrong'.
>>>
>>> For those people who understand the risk of scanning over large numbers
>>> of tombstones there is a configuration option in the cassandra.yaml to
>>> increase this threshold, tombstone_failure_threshold.[4]
>>>
>>>
>>> Mark
>>>
>>> [1]
>>> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-6042
>>> [3] https://issues.apache.org/jira/browse/CASSANDRA-6117
>>> [4]
>>> https://github.com/jbellis/cassandra/blob/4ac18ae805d28d8f4cb44b42e2244bfa6d2875e1/conf/cassandra.yaml#L407-L417
>>>
>>>
>>>
>>> On Sun, Aug 10, 2014 at 7:19 PM, Ian Rose <ianrose@fullstory.com> wrote:
>>>
>>>> Hi -
>>>>
>>>> On this page (
>>>> http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html),
>>>> the docs state:
>>>>
>>>> Do not use an index [...] On a frequently updated or deleted column
>>>>
>>>>
>>>> and
>>>>
>>>>
>>>>> *Problems using an index on a frequently updated or deleted column*ΒΆ
>>>>> <http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__upDatIndx>
>>>>
>>>> Cassandra stores tombstones in the index until the tombstone limit
>>>>> reaches 100K cells. After exceeding the tombstone limit, the query that
>>>>> uses the indexed value will fail.
>>>>
>>>>
>>>>
>>>> I'm afraid I don't really understand this limit from its (brief)
>>>> description.  I also saw this recent thread
>>>> <http://mail-archives.apache.org/mod_mbox/cassandra-user/201403.mbox/%3CCABNXB2Bf4aeoDVpMNOxJ_e7aDez2EuZswMJx=jWfb8=Oyo47kQ@mail.gmail.com%3E>
but
>>>> I'm afraid it didn't help me much...
>>>>
>>>>
>>>> *SHORT VERSION*
>>>>
>>>> If I have tens or hundreds of thousands of rows in a keyspace, where
>>>> every row has an indexed column that is updated O(10) times during the
>>>> lifetime of each row, is that going to cause problems for me?  If that 100k
>>>> limit is *per row* then I should be fine but if that 100k limit is *per
>>>> keyspace* then I'd definitely exceed it quickly.
>>>>
>>>>
>>>> *FULL EXPLANATION*
>>>>
>>>> In our system, items are created at a rate of ~10/sec.  Each item is
>>>> updated ~10 times over the next few minutes (although in rare cases the
>>>> number of updates, and the duration, might be several times as long).  Once
>>>> the last update is received for an item, we select it from Cassandra,
>>>> process the data, then delete the entire row.
>>>>
>>>> The tricky bit is that sometimes (maybe 30-40% of the time) we don't
>>>> actually know when the last update has been received so we use a timeout:
>>>> if an item hasn't been updated for 30 minutes, then we assume it is done
>>>> and should process it as before (select, then delete).  So I am trying to
>>>> design a schema that will allow for efficient queries of the form "find me
>>>> all items that have not been updated in the past 30 minutes."  We plan to
>>>> call this query once a minute.
>>>>
>>>> Here is my tentative schema:
>>>>
>>>> CREATE TABLE items (
>>>>   item_id ascii,
>>>>   last_updated timestamp,
>>>>   item_data list<blob>,
>>>>   PRIMARY KEY (item_id)
>>>> )
>>>> plus an index on last_updated.
>>>>
>>>> So updates to an existing item would just be "lookup by item_id, append
>>>> new data to item_data, and set last_updated to now".  And queries to find
>>>> items that have timed out would use the index on last_updated: "find all
>>>> items where last_updated < [now - 30 minutes]".
>>>>
>>>> Assuming, that is, that the aforementioned 100k tombstone limit won't
>>>> bring this index crashing to a halt...
>>>>
>>>> Any clarification on this limit and/or suggestions on a better way to
>>>> model/implement this system would be greatly appreciated!
>>>>
>>>> Cheers,
>>>> Ian
>>>>
>>>>
>>>
>>
>

Mime
View raw message