incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Composite Column Grouping
Date Wed, 11 Sep 2013 05:01:03 GMT
Thanks Michael,

But I cannot sort the rows in memory, as the number of columns will be
quite huge.

>From the python script above:
   select_stmt = "select * from time_series where userid = 'XYZ'"

This would return me many hundreds of thousands of columns. I need to go in
time-series order using ranges [Pagination queries].


On Wed, Sep 11, 2013 at 7:06 AM, Laing, Michael
<michael.laing@nytimes.com>wrote:

> If you have set up the table as described in my previous message, you
> could run this python snippet to return the desired result:
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> import logging
> logging.basicConfig()
>
> from operator import itemgetter
>
> import cassandra
> from cassandra.cluster import Cluster
> from cassandra.query import SimpleStatement
>
> cql_cluster = Cluster()
> cql_session = cql_cluster.connect()
> cql_session.set_keyspace('latest')
>
> select_stmt = "select * from time_series where userid = 'XYZ'"
> query = SimpleStatement(select_stmt)
> rows = cql_session.execute(query)
>
> results = []
> for row in rows:
>     max_time = max(row.colname.keys())
>     results.append((row.userid, row.pkid, max_time, row.colname[max_time]))
>
> sorted_results = sorted(results, key=itemgetter(2), reverse=True)
> for result in sorted_results: print result
>
> # prints:
>
> # (u'XYZ', u'1002', u'204', u'Col-Name-5')
> # (u'XYZ', u'1000', u'203', u'Col-Name-4')
> # (u'XYZ', u'1001', u'201', u'Col-Name-2')
>
>
>
> On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael <michael.laing@nytimes.com
> > wrote:
>
>> You could try this. C* doesn't do it all for you, but it will efficiently
>> get you the right data.
>>
>> -ml
>>
>> -- put this in <file> and run using 'cqlsh -f <file>
>>
>> DROP KEYSPACE latest;
>>
>> CREATE KEYSPACE latest WITH replication = {
>>     'class': 'SimpleStrategy',
>>     'replication_factor' : 1
>> };
>>
>> USE latest;
>>
>> CREATE TABLE time_series (
>>     userid text,
>>     pkid text,
>>     colname map<text, text>,
>>     PRIMARY KEY (userid, pkid)
>> );
>>
>> UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
>> userid = 'XYZ' AND pkid = '1000';
>> UPDATE time_series SET colname = colname +
>> {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
>> UPDATE time_series SET colname = colname +
>> {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
>> UPDATE time_series SET colname = colname +
>> {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
>> UPDATE time_series SET colname = colname +
>> {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';
>>
>> SELECT * FROM time_series WHERE userid = 'XYZ';
>>
>> -- returns:
>> -- userid | pkid | colname
>>
>> ----------+------+-----------------------------------------------------------------
>> --    XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
>> 'Col-Name-4'}
>> --    XYZ | 1001 |                                           {'201':
>> 'Col-Name-2'}
>> --    XYZ | 1002 |                                           {'204':
>> 'Col-Name-5'}
>>
>> -- use an app to pop off the latest key/value from the map for each row,
>> then sort by key desc.
>>
>>
>> On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
>> ravikumar.govindarajan@gmail.com> wrote:
>>
>>> I have been faced with a problem of grouping composites on the
>>> second-part.
>>>
>>> Lets say my CF contains this
>>>
>>>
>>> TimeSeriesCF
>>>                        key:                            UserID
>>>                        composite-col-name:    TimeUUID:PKID
>>>
>>> Some sample data
>>>
>>> UserID = XYZ
>>>                                  Time:PKID
>>>                Col-Name1 = 200:1000
>>>                Col-Name2 = 201:1001
>>>                Col-Name3 = 202:1000
>>>                Col-Name4 = 203:1000
>>>                Col-Name5 = 204:1002
>>>
>>> Whenever a time-series query is issued, it should return the following
>>> in time-desc order.
>>>
>>> UserID = XYZ
>>>               Col-Name5 = 204:1002
>>>               Col-Name4 = 203:1000
>>>               Col-Name2 = 201:1001
>>>
>>> Is something like this possible in Cassandra? Is there a different way
>>> to design and achieve the same objective?
>>>
>>> --
>>> Ravi
>>>
>>>
>>
>>
>

Mime
View raw message