cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hannu Kröger <hkro...@gmail.com>
Subject Re: Range deletes, wide partitions, and reverse iterators
Date Tue, 16 May 2017 14:34:44 GMT
Hello,

If you mean how to construct a query like that: you use ORDER BY clause with SELECT which
is reverse to the default just like in the example below? If the table is constructed with
"clustering order by (timeid ASC)” and you query “SELECT ... ORDER BY timeid DESC”,
then the partition is read backwards. I don’t know how it is technically done but it is
apparently slightly slower then reading partition normally.

Hannu 

> On 16 May 2017, at 17:29, Nitan Kainth <nitan@bamlabs.com> wrote:
> 
> Hannu,
> 
> How can you read a partition in reverse? 
> 
> Sent from my iPhone
> 
>> On May 16, 2017, at 9:20 AM, Hannu Kröger <hkroger@gmail.com> wrote:
>> 
>> Well, I’m guessing that Cassandra doesn't really know if the range tombstone is
useful for this or not. 
>> 
>> In many cases it might be that the partition contains data that is within the range
of the tombstone but is newer than the tombstone and therefore it might be still be returned.
Scanning through deleted data can be avoided by reading the partition in reverse (if all the
deleted data is in the beginning of the partition). Eventually you will still end up reading
a lot of tombstones but you will get a lot of live data first and the implicit query limit
of 10000 probably is reached before you get to the tombstones. Therefore you will get an immediate
answer.
>> 
>> Does it make sense?
>> 
>> Hannu
>> 
>>> On 16 May 2017, at 16:33, Stefano Ortolani <ostefano@gmail.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I am seeing inconsistencies when mixing range tombstones, wide partitions, and
reverse iterators.
>>> I still have to understand if the behaviour is to be expected hence the message
on the mailing list.
>>> 
>>> The situation is conceptually simple. I am using a table defined as follows:
>>> 
>>> CREATE TABLE test_cql.test_cf (
>>> hash blob,
>>> timeid timeuuid,
>>> PRIMARY KEY (hash, timeid)
>>> ) WITH CLUSTERING ORDER BY (timeid ASC)
>>> AND compaction = {'class' : 'LeveledCompactionStrategy'};
>>> 
>>> I then proceed by loading 2/3GB from 3 sstables which I know contain a really
wide partition (> 512 MB) for `hash = x`. I then delete the oldest _half_ of that partition
by executing the query below, and restart the node:
>>> 
>>> DELETE 
>>> FROM test_cql.test_cf 
>>> WHERE hash = x AND timeid < y;
>>> 
>>> If I keep compactions disabled the following query timeouts (takes more than
10 seconds to 
>>> succeed):
>>> 
>>> SELECT * 
>>> FROM test_cql.test_cf 
>>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>>> ORDER BY timeid ASC;
>>> 
>>> While the following returns immediately (obviously because no deleted data is
ever read):
>>> 
>>> SELECT * 
>>> FROM test_cql.test_cf 
>>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>>> ORDER BY timeid DESC;
>>> 
>>> If I force a compaction the problem is gone, but I presume just because the data
is rearranged.
>>> 
>>> It seems to me that reading by ASC does not make use of the range tombstone until
C* reads the
>>> last sstables (which actually contains the range tombstone and is flushed at
node restart), and it wastes time reading all rows that are actually not live anymore. 
>>> 
>>> Is this expected? Should the range tombstone actually help in these cases?
>>> 
>>> Thanks a lot!
>>> Stefano
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message