cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: What is wrong in this token function
Date Thu, 10 Mar 2016 21:55:53 GMT
What partitioner are you using? The default partitioner is not "ordered",
so it will randomly order the hashes/tokens, so that tokens will not be
ordered even if your PKs are ordered. You probably want to use customer as
your partition key and event time as a clustering column - then you can use
RDBMS-like WHERE conditions to select a slice of the partition.

-- Jack Krupansky

On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <dcruncher4@aim.com> wrote:

>
> typo: the primary key was (customer_id + event_time )
>
>
> -----Original Message-----
> From: Rakesh Kumar <dcruncher4@aim.com>
> To: user <user@cassandra.apache.org>
> Sent: Thu, Mar 10, 2016 4:44 pm
> Subject: What is wrong in this token function
>
> C*  3.0.3
>
> I have a table table1 which has the primary key on
> ((customer_id,event_id)).
>
> I loaded 1.03 million rows from a csv file.
>
> Business case: Show me all events for a given customer in a given time
> frame
>
> In RDBMS it will be
>
> (Query1)
> where customer_id = '289'
> and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12
> 19:05:00+0000'   ;
>
> But C* does not allow >= <= on PKY cols. It suggested token function.
>
> So I did this:
>
> (Query2)
> where token(customer_id,event_time) >= token('289','2016-03-01
> 18:45:00+0000')
> and token(customer_id,event_time) <= token('289','2016-03-12
> 19:05:00+0000')  ;
>
> I am seeing 75% more rows than what it should be. It should be 99K rows,
> it shows 163K.
>
> I checked the output with the csv file itself.  To double check I loaded
> the csv in another table
> with modified PKY so that the first query (Query1) can be executed. It
> also showed 99K rows.
>
> Am I using token function incorrectly ?
>
>
>
>

Mime
View raw message