Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A5E7A10DFC for ; Wed, 11 Sep 2013 01:37:30 +0000 (UTC) Received: (qmail 58960 invoked by uid 500); 11 Sep 2013 01:37:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 58906 invoked by uid 500); 11 Sep 2013 01:37:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 58898 invoked by uid 99); 11 Sep 2013 01:37:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 01:37:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.219.53] (HELO mail-oa0-f53.google.com) (209.85.219.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 01:37:22 +0000 Received: by mail-oa0-f53.google.com with SMTP id k18so8599646oag.26 for ; Tue, 10 Sep 2013 18:36:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=erJMwiTPB/YMdOJmRpdRDU1meq8fV0SZDK8VWqw31wI=; b=CAgq9nPnLdOOIMWXuu6478Ai3iqtF8mHlPM3+cOpxSDkwfsrlxwe5ohXqnP4aVBE2d Zz8Y0rIpZ3G/hOZfvRBxvFnrGFdo1SB0kjq855xpCojEbqoKx9dOU3wGeonWXtqBD00O lB+bqMV7BpklKDtc4SbhbY3V2LD8GdN8/bu8jl+XEOpDGqL4FUd9mDkxZgNlZpsJ0aQc 13kpBvn8c5Q1hClW2i9sfvpMakurd7Cr76Ui6aAx+PQgP5zngZbURB8aknfV82K/IV63 R5pxpBDfnxx1VvvY5u6/P9RPDJLkfcW8JoAD9bIqaJ9xZt9ptuSvewnFAXl2GLqm5dPv aCiQ== X-Gm-Message-State: ALoCoQmI9Q3NByQZAJ9SL4N73rhabz2lW3+7eCUWsFnHlKMsI0j4AiJHfJR5zVDQmXLJhtyRp3E2 MIME-Version: 1.0 X-Received: by 10.60.45.65 with SMTP id k1mr8488598oem.48.1378863401511; Tue, 10 Sep 2013 18:36:41 -0700 (PDT) Received: by 10.182.107.134 with HTTP; Tue, 10 Sep 2013 18:36:41 -0700 (PDT) In-Reply-To: References: Date: Tue, 10 Sep 2013 21:36:41 -0400 Message-ID: Subject: Re: Composite Column Grouping From: "Laing, Michael" To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0141a79c3dc56e04e611a7ea X-Virus-Checked: Checked by ClamAV on apache.org --089e0141a79c3dc56e04e611a7ea Content-Type: text/plain; charset=UTF-8 If you have set up the table as described in my previous message, you could run this python snippet to return the desired result: #!/usr/bin/env python # -*- coding: utf-8 -*- import logging logging.basicConfig() from operator import itemgetter import cassandra from cassandra.cluster import Cluster from cassandra.query import SimpleStatement cql_cluster = Cluster() cql_session = cql_cluster.connect() cql_session.set_keyspace('latest') select_stmt = "select * from time_series where userid = 'XYZ'" query = SimpleStatement(select_stmt) rows = cql_session.execute(query) results = [] for row in rows: max_time = max(row.colname.keys()) results.append((row.userid, row.pkid, max_time, row.colname[max_time])) sorted_results = sorted(results, key=itemgetter(2), reverse=True) for result in sorted_results: print result # prints: # (u'XYZ', u'1002', u'204', u'Col-Name-5') # (u'XYZ', u'1000', u'203', u'Col-Name-4') # (u'XYZ', u'1001', u'201', u'Col-Name-2') On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael wrote: > You could try this. C* doesn't do it all for you, but it will efficiently > get you the right data. > > -ml > > -- put this in and run using 'cqlsh -f > > DROP KEYSPACE latest; > > CREATE KEYSPACE latest WITH replication = { > 'class': 'SimpleStrategy', > 'replication_factor' : 1 > }; > > USE latest; > > CREATE TABLE time_series ( > userid text, > pkid text, > colname map, > PRIMARY KEY (userid, pkid) > ); > > UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE > userid = 'XYZ' AND pkid = '1000'; > UPDATE time_series SET colname = colname + > {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001'; > UPDATE time_series SET colname = colname + > {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000'; > UPDATE time_series SET colname = colname + > {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000'; > UPDATE time_series SET colname = colname + > {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002'; > > SELECT * FROM time_series WHERE userid = 'XYZ'; > > -- returns: > -- userid | pkid | colname > > ----------+------+----------------------------------------------------------------- > -- XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203': > 'Col-Name-4'} > -- XYZ | 1001 | {'201': > 'Col-Name-2'} > -- XYZ | 1002 | {'204': > 'Col-Name-5'} > > -- use an app to pop off the latest key/value from the map for each row, > then sort by key desc. > > > On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan < > ravikumar.govindarajan@gmail.com> wrote: > >> I have been faced with a problem of grouping composites on the >> second-part. >> >> Lets say my CF contains this >> >> >> TimeSeriesCF >> key: UserID >> composite-col-name: TimeUUID:PKID >> >> Some sample data >> >> UserID = XYZ >> Time:PKID >> Col-Name1 = 200:1000 >> Col-Name2 = 201:1001 >> Col-Name3 = 202:1000 >> Col-Name4 = 203:1000 >> Col-Name5 = 204:1002 >> >> Whenever a time-series query is issued, it should return the following in >> time-desc order. >> >> UserID = XYZ >> Col-Name5 = 204:1002 >> Col-Name4 = 203:1000 >> Col-Name2 = 201:1001 >> >> Is something like this possible in Cassandra? Is there a different way to >> design and achieve the same objective? >> >> -- >> Ravi >> >> > > --089e0141a79c3dc56e04e611a7ea Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
If you have set up the table as described in my previous m= essage, you could run this python snippet to return the desired result:
#!/usr/bin/env python
# -*- coding: utf-8 -*= -
import logging
logging.basicConfig()

from operator import itemgetter

import cassandra=
from cassandra.cluster import Cluster
from cassandra.q= uery import SimpleStatement

cql_cluster =3D Cluster()
cql_session =3D cql= _cluster.connect()
cql_session.set_keyspace('latest')

select_stmt =3D "select * from time_series where= userid =3D 'XYZ'"
query =3D SimpleStatement(select_stmt)
rows =3D cql_session.= execute(query)

results =3D []
for row in= rows:
=C2=A0 =C2=A0 max_time =3D max(row.colname.keys())
=C2=A0 =C2=A0 results.append((row.userid, row.pkid, max_time, row.colnam= e[max_time]))
=C2=A0 =C2=A0=C2=A0
sorted_results =3D sorted(results, key= =3Ditemgetter(2), reverse=3DTrue)
for result in sorted_results: p= rint result

# prints:

# (= u'XYZ', u'1002', u'204', u'Col-Name-5')
# (u'XYZ', u'1000', u'203', u'Col-Name-4&#= 39;)
# (u'XYZ', u'1001', u'201', u'Co= l-Name-2')



On Tue, Sep 10, 2013 at 6:32 PM, Laing, = Michael <michael.laing@nytimes.com> wrote:
You could try this. C* doesn't do it all for you, but = it will efficiently get you the right data.

-ml

-- put this in <file> and run using 'c= qlsh -f <file>

DROP KEYSPACE latest;

CREATE K= EYSPACE latest WITH replication =3D {
=C2=A0 =C2=A0 'class= 9;: 'SimpleStrategy',=C2=A0
=C2=A0 =C2=A0 'replicatio= n_factor' : 1
};

USE latest;

CREAT= E TABLE time_series (
=C2=A0 =C2=A0 userid text,
=C2=A0= =C2=A0 pkid text,
=C2=A0 =C2=A0 colname map<text, text>,
=C2=A0 =C2=A0 PRIMARY KEY (userid, pkid)
);

UPDATE time_series SET colname =3D colname= + {'200':'Col-Name-1'} WHERE userid =3D 'XYZ' AND = pkid =3D '1000';
UPDATE time_series=C2=A0SET=C2=A0colname= =3D colname + {'201':'Col-Name-2'}=C2=A0WHERE=C2=A0userid = =3D 'XYZ'=C2=A0AND=C2=A0pkid =3D '1001';
UPDATE time_series=C2=A0SET=C2=A0colname =3D colname + {'202':= 'Col-Name-3'}=C2=A0WHERE=C2=A0userid =3D 'XYZ'=C2=A0AND=C2= =A0pkid =3D '1000';
UPDATE time_series=C2=A0SET=C2=A0coln= ame =3D colname + {'203':'Col-Name-4'}=C2=A0WHERE=C2=A0user= id =3D 'XYZ'=C2=A0AND=C2=A0pkid =3D '1000';
UPDATE time_series=C2=A0SET=C2=A0colname =3D colname + {'204':= 'Col-Name-5'}=C2=A0WHERE=C2=A0userid =3D 'XYZ'=C2=A0AND=C2= =A0pkid =3D '1002';

SELECT * FROM time_ser= ies WHERE userid =3D 'XYZ';

-- returns:
-- userid | pkid | colname
<= div>----------+------+-----------------------------------------------------= ------------
-- =C2=A0 =C2=A0XYZ | 1000 | {'200': 'Co= l-Name-1', '202': 'Col-Name-3', '203': 'Col= -Name-4'}
-- =C2=A0 =C2=A0XYZ | 1001 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {'201': 'Col-Name-2'}
=
-- =C2=A0 =C2=A0XYZ | 1002 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {'204': 'Col-Name-5'}
=

-- use an app to pop off the latest key/value from the map f= or each row, then sort by key desc.


On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan &= lt;ra= vikumar.govindarajan@gmail.com> wrote:
I have been faced with a pr= oblem of grouping composites on the second-part.

Lets sa= y my CF contains this


TimeSeriesCF
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0key: =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0UserID
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0composite-col-name: =C2=A0 =C2=A0TimeUUID:PKID

<= /div>
Some sample data

UserID =3D XYZ =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0Time:PKID
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Col-Name1 =3D 200:10= 00
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Col-Nam= e2 =3D 201:1001
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Col-Name3 =3D 202:1000
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0Col-Name4 =3D 203:1000
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Col-Name5 =3D 204:1002

Whenever a time-series query is issued, it should retur= n the following in time-desc order.

UserID =3D XYZ=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Col-Name5 =3D 2= 04:1002
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Col-Name= 4 =3D 203:1000
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Col-Name2 =3D 201:100= 1

Is something like this possible in Cassandra? Is= there a different way to design and achieve the same objective?
=
--
Ravi
=C2=A0 =C2=A0


--089e0141a79c3dc56e04e611a7ea--