cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Levi <...@indeni.com>
Subject Re: Getting all unique keys
Date Sun, 20 Aug 2017 15:34:33 GMT
Thank you very much , one question . you wrote that I do not need distinct
here since it's a part from the primary key. but only the combination is
unique (*PRIMARY KEY (id, timestamp) ) .* also if I take the last token and
feed it back as you showed wouldn't I get overlapping boundaries ?

On Sun, Aug 20, 2017 at 6:18 PM, Eric Stevens <mightye@gmail.com> wrote:

> You should be able to fairly efficiently iterate all the partition keys
> like:
>
> select id, token(id) from table where token(id) >= -9204925292781066255
> limit 1000;
>  id                                         | system.token(id)
> --------------------------------------------+----------------------
> ...
>  0xb90ea1db5c29f2f6d435426dccf77cca6320fac9 | -7821793584824523686
>
> Take the last token you receive and feed it back in, skipping duplicates
> from the previous page (on the unlikely chance that you have two ID's with
> a token collision on the page boundary):
>
> select id, token(id) from table where token(id) >=
> -7821793584824523686 limit 1000;
>  id                                         | system.token(id)
> --------------------------------------------+---------------------
> ...
>  0xc6289d729c9087fb5a1fe624b0b883ab82a9bffe | -434806781044590339
>
> Continue until you have no more results.  You don't really need distinct
> here: it's part of your primary key, it must already be distinct.
>
> If you want to parallelize it, split the ring into *n* ranges and include
> it as an upper bound for each segment.
>
> select id, token(id) from table where token(id) >= -9204925292781066255
> AND token(id) < $rangeUpperBound limit 1000;
>
>
> On Sun, Aug 20, 2017 at 12:33 AM Avi Levi <avi@indeni.com> wrote:
>
>> I need to get all unique keys (not the complete primary key, just the
>> partition key) in order to aggregate all the relevant records of that key
>> and apply some calculations on it.
>>
>> *CREATE TABLE my_table (
>>
>>     id text,
>>
>>     timestamp bigint,
>>
>>     value double,
>>
>>     PRIMARY KEY (id, timestamp) )*
>>
>> I know that to query like this
>>
>> *SELECT DISTINCT id FROM my_table *
>>
>> is not very efficient but how about the approach presented here <http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/>
sending queries in parallel and using the token
>>
>> *SELECT DISTINCT id FROM my_table WHERE token(id) >= -9204925292781066255 AND
token(id) <= -9223372036854775808; *
>>
>> *or I can just maintain another table with the unique keys *
>>
>> *CREATE TABLE id_only ( id text,
>>
>>     PRIMARY KEY (id) )*
>>
>> but I tend not to since it is error prone and will enforce other procedures to maintain
data integrity between those two tables .
>>
>> any ideas ?
>>
>> Thanks
>>
>> Avi
>>
>>

Mime
View raw message