incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Venturella <aventure...@gmail.com>
Subject Re: CQL3 to ORDER BY Verification
Date Fri, 01 Mar 2013 17:05:31 GMT
Perfect! Thanks for the response Sylvain!

On Friday, March 1, 2013, Sylvain Lebresne wrote:

> On Fri, Mar 1, 2013 at 5:16 PM, Adam Venturella <aventurella@gmail.com<javascript:_e({},
'cvml', 'aventurella@gmail.com');>
> > wrote:
>
>> My ColumnFamily is defined as follows:
>>
>>
>> CREATE TABLE UserProfileHistory(
>>     username text,
>>     timestamp bigint, -- millis since epoch
>>     data text, -- JSON
>>     PRIMARY KEY (username, timestamp)
>> ) WITH CLUSTERING ORDER BY (timestamp DESC);
>>
>>
>> Each insert on the username adds to the wide row. The most recent profile
>> history being able to be retrieved by
>>
>> SELECT * FROM UserProfileHistory WHERE username=:username LIMIT 1;
>>
>> For some reporting needs I need to fetch the entire history, and I need
>> to do it in ASC order instead of DESC.
>>
>> One option is to do the sorting in code, collect N results, sort on the
>> timestamps accordingly. Given the row is of N length, that could start to
>> put an undo memory burden in my application layer, and I would like to
>> avoid that if possible opting instead for Cassandra to perform the work.
>>
>>
>> So I am leaning towards this option:
>>
>> 2) min timestamp seek + ORDER BY
>>
>> To start the process my initial timestamp would be
>> 01-01-1970T12:00:00+0000 (assume that is in milliseconds, aka 0) I would
>> then issue my query:
>>
>> SELECT * FROM UserProfileHistory WHERE username=:username AND timestamp >
>> :milliseconds ORDER BY timestamp ASC LIMIT 100
>>
>> Once I have those initial results I would just pick my last timestamp
>> from the result set and + 1 on it  and run the query again until I received
>> 0 results.
>>
>>
>> The CQL works and returns my results as I expect. This will probably only
>> be run once every 24 hours, maybe every 12 hours; point being, not often.
>>
>> Am I setting myself up for a disaster down the line?
>>
>
> No.
>
> Paging over a partition key like you do in reverse order of the clustering
> order by is slightly slower than doing it in the clustering order, but not
> by a whole lot. It's slightly slower because 1) there will be backward seek
> underneath between the on-disk index block (not a huge deal) and 2) there
> is some reversing of lists going on before returning each query (again, not
> a huge deal). You'll be totally fine, especially if that query is not the
> one on which latency is the most critical.
>
> --
> Sylvain
>

Mime
View raw message