Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAKgmDnHDaL36a_3Rrf81-2kYvyQ-3FWioWZnsLeG-gAkXFizaw@mail.gmail.com>
References: 
 <CAGW2whQpQ164kkq5oco0XcfUEfNbH1H+Urpzj9JKa4Fk2-azmA@mail.gmail.com>
	<CAKgmDnHDaL36a_3Rrf81-2kYvyQ-3FWioWZnsLeG-gAkXFizaw@mail.gmail.com>
Date: Tue, 10 Sep 2013 21:36:41 -0400
Message-ID: 
 <CAKgmDnFNznyki-i1fZEoZm6EEE7PxKvi4kG0oR8RDQ-F5=ZFAw@mail.gmail.com>
Subject: Re: Composite Column Grouping
From: "Laing, Michael" <michael.laing@nytimes.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e0141a79c3dc56e04e611a7ea

--089e0141a79c3dc56e04e611a7ea
Content-Type: text/plain; charset=UTF-8

If you have set up the table as described in my previous message, you could
run this python snippet to return the desired result:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
logging.basicConfig()

from operator import itemgetter

import cassandra
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

cql_cluster = Cluster()
cql_session = cql_cluster.connect()
cql_session.set_keyspace('latest')

select_stmt = "select * from time_series where userid = 'XYZ'"
query = SimpleStatement(select_stmt)
rows = cql_session.execute(query)

results = []
for row in rows:
    max_time = max(row.colname.keys())
    results.append((row.userid, row.pkid, max_time, row.colname[max_time]))

sorted_results = sorted(results, key=itemgetter(2), reverse=True)
for result in sorted_results: print result

# prints:

# (u'XYZ', u'1002', u'204', u'Col-Name-5')
# (u'XYZ', u'1000', u'203', u'Col-Name-4')
# (u'XYZ', u'1001', u'201', u'Col-Name-2')


On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael
<michael.laing@nytimes.com>wrote:

> You could try this. C* doesn't do it all for you, but it will efficiently
> get you the right data.
>
> -ml
>
> -- put this in <file> and run using 'cqlsh -f <file>
>
> DROP KEYSPACE latest;
>
> CREATE KEYSPACE latest WITH replication = {
>     'class': 'SimpleStrategy',
>     'replication_factor' : 1
> };
>
> USE latest;
>
> CREATE TABLE time_series (
>     userid text,
>     pkid text,
>     colname map<text, text>,
>     PRIMARY KEY (userid, pkid)
> );
>
> UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
> userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
> UPDATE time_series SET colname = colname +
> {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';
>
> SELECT * FROM time_series WHERE userid = 'XYZ';
>
> -- returns:
> -- userid | pkid | colname
>
> ----------+------+-----------------------------------------------------------------
> --    XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
> 'Col-Name-4'}
> --    XYZ | 1001 |                                           {'201':
> 'Col-Name-2'}
> --    XYZ | 1002 |                                           {'204':
> 'Col-Name-5'}
>
> -- use an app to pop off the latest key/value from the map for each row,
> then sort by key desc.
>
>
> On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
>> I have been faced with a problem of grouping composites on the
>> second-part.
>>
>> Lets say my CF contains this
>>
>>
>> TimeSeriesCF
>>                        key:                            UserID
>>                        composite-col-name:    TimeUUID:PKID
>>
>> Some sample data
>>
>> UserID = XYZ
>>                                  Time:PKID
>>                Col-Name1 = 200:1000
>>                Col-Name2 = 201:1001
>>                Col-Name3 = 202:1000
>>                Col-Name4 = 203:1000
>>                Col-Name5 = 204:1002
>>
>> Whenever a time-series query is issued, it should return the following in
>> time-desc order.
>>
>> UserID = XYZ
>>               Col-Name5 = 204:1002
>>               Col-Name4 = 203:1000
>>               Col-Name2 = 201:1001
>>
>> Is something like this possible in Cassandra? Is there a different way to
>> design and achieve the same objective?
>>
>> --
>> Ravi
>>
>>
>
>

--089e0141a79c3dc56e04e611a7ea
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">If you have set up the table as described in my previous m=
essage, you could run this python snippet to return the desired result:<div=
><br></div><div><div>#!/usr/bin/env python</div><div># -*- coding: utf-8 -*=
-</div>
<div>import logging</div><div>logging.basicConfig()</div><div><br></div><di=
v>from operator import itemgetter</div><div><br></div><div>import cassandra=
</div><div>from cassandra.cluster import Cluster</div><div>from cassandra.q=
uery import SimpleStatement</div>
<div><br></div><div>cql_cluster =3D Cluster()</div><div>cql_session =3D cql=
_cluster.connect()</div><div>cql_session.set_keyspace(&#39;latest&#39;)</di=
v><div><br></div><div>select_stmt =3D &quot;select * from time_series where=
 userid =3D &#39;XYZ&#39;&quot;</div>
<div>query =3D SimpleStatement(select_stmt)</div><div>rows =3D cql_session.=
execute(query)</div><div><br></div><div>results =3D []</div><div>for row in=
 rows:</div><div>=C2=A0 =C2=A0 max_time =3D max(row.colname.keys())</div><d=
iv>=C2=A0 =C2=A0 results.append((row.userid, row.pkid, max_time, row.colnam=
e[max_time]))</div>
<div>=C2=A0 =C2=A0=C2=A0</div><div>sorted_results =3D sorted(results, key=
=3Ditemgetter(2), reverse=3DTrue)</div><div>for result in sorted_results: p=
rint result</div><div><br></div><div># prints:</div><div><br></div><div># (=
u&#39;XYZ&#39;, u&#39;1002&#39;, u&#39;204&#39;, u&#39;Col-Name-5&#39;)</di=
v>
<div># (u&#39;XYZ&#39;, u&#39;1000&#39;, u&#39;203&#39;, u&#39;Col-Name-4&#=
39;)</div><div># (u&#39;XYZ&#39;, u&#39;1001&#39;, u&#39;201&#39;, u&#39;Co=
l-Name-2&#39;)</div></div><div><br></div></div><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Tue, Sep 10, 2013 at 6:32 PM, Laing, =
Michael <span dir=3D"ltr">&lt;<a href=3D"mailto:michael.laing@nytimes.com" =
target=3D"_blank">michael.laing@nytimes.com</a>&gt;</span> wrote:<br><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex">
<div dir=3D"ltr">You could try this. C* doesn&#39;t do it all for you, but =
it will efficiently get you the right data.<div><br></div><div>-ml</div><di=
v><div><br></div><div><div>-- put this in &lt;file&gt; and run using &#39;c=
qlsh -f &lt;file&gt;</div>

<div><br></div><div>DROP KEYSPACE latest;</div><div><br></div><div>CREATE K=
EYSPACE latest WITH replication =3D {</div><div>=C2=A0 =C2=A0 &#39;class=
9;: &#39;SimpleStrategy&#39;,=C2=A0</div><div>=C2=A0 =C2=A0 &#39;replicatio=
n_factor&#39; : 1</div>

<div>};</div><div><br></div><div>USE latest;</div><div><br></div><div>CREAT=
E TABLE time_series (</div><div>=C2=A0 =C2=A0 userid text,</div><div>=C2=A0=
 =C2=A0 pkid text,</div><div>=C2=A0 =C2=A0 colname map&lt;text, text&gt;,</=
div><div>=C2=A0 =C2=A0 PRIMARY KEY (userid, pkid)</div>

<div>);</div><div><br></div><div>UPDATE time_series SET colname =3D colname=
 + {&#39;200&#39;:&#39;Col-Name-1&#39;} WHERE userid =3D &#39;XYZ&#39; AND =
pkid =3D &#39;1000&#39;;</div><div>UPDATE time_series=C2=A0SET=C2=A0colname=
 =3D colname + {&#39;201&#39;:&#39;Col-Name-2&#39;}=C2=A0WHERE=C2=A0userid =
=3D &#39;XYZ&#39;=C2=A0AND=C2=A0pkid =3D &#39;1001&#39;;</div>

<div>UPDATE time_series=C2=A0SET=C2=A0colname =3D colname + {&#39;202&#39;:=
&#39;Col-Name-3&#39;}=C2=A0WHERE=C2=A0userid =3D &#39;XYZ&#39;=C2=A0AND=C2=
=A0pkid =3D &#39;1000&#39;;</div><div>UPDATE time_series=C2=A0SET=C2=A0coln=
ame =3D colname + {&#39;203&#39;:&#39;Col-Name-4&#39;}=C2=A0WHERE=C2=A0user=
id =3D &#39;XYZ&#39;=C2=A0AND=C2=A0pkid =3D &#39;1000&#39;;</div>

<div>UPDATE time_series=C2=A0SET=C2=A0colname =3D colname + {&#39;204&#39;:=
&#39;Col-Name-5&#39;}=C2=A0WHERE=C2=A0userid =3D &#39;XYZ&#39;=C2=A0AND=C2=
=A0pkid =3D &#39;1002&#39;;</div><div><br></div><div>SELECT * FROM time_ser=
ies WHERE userid =3D &#39;XYZ&#39;;</div>

<div><br></div><div>-- returns:</div><div>-- userid | pkid | colname</div><=
div>----------+------+-----------------------------------------------------=
------------</div><div>-- =C2=A0 =C2=A0XYZ | 1000 | {&#39;200&#39;: &#39;Co=
l-Name-1&#39;, &#39;202&#39;: &#39;Col-Name-3&#39;, &#39;203&#39;: &#39;Col=
-Name-4&#39;}</div>

<div>-- =C2=A0 =C2=A0XYZ | 1001 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {&#39;201&#39;: &#39;Col-Name-2&#39;}</div>=
<div>-- =C2=A0 =C2=A0XYZ | 1002 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {&#39;204&#39;: &#39;Col-Name-5&#39;}</div>=
<div>

<br></div><div>-- use an app to pop off the latest key/value from the map f=
or each row, then sort by key desc.</div></div></div></div><div class=3D"HO=
EnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gm=
ail_quote">
On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <span dir=3D"ltr">&=
lt;<a href=3D"mailto:ravikumar.govindarajan@gmail.com" target=3D"_blank">ra=
vikumar.govindarajan@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I have been faced with a pr=
oblem of grouping composites on the second-part.<div><br></div><div>Lets sa=
y my CF contains this</div>

<div><br></div><div><br></div><div>TimeSeriesCF</div><div>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0key: =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0UserID</div>
<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0composite-col-name: =C2=A0 =C2=A0TimeUUID:PKID</div><div><br><=
/div><div>Some sample data</div><div><br></div><div>UserID =3D XYZ =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0</div><div>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0Time:PKID</div><div>


=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Col-Name1 =3D 200:10=
00</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Col-Nam=
e2 =3D 201:1001</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0Col-Name3 =3D 202:1000</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0Col-Name4 =3D 203:1000</div><div>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Col-Name5 =3D 204:1002</div>


<div><br></div><div>Whenever a time-series query is issued, it should retur=
n the following in time-desc order.</div><div><br></div><div>UserID =3D XYZ=
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Col-Name5 =3D 2=
04:1002</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Col-Name=
4 =3D 203:1000</div>


<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Col-Name2 =3D 201:100=
1</div><div><br></div><div>Is something like this possible in Cassandra? Is=
 there a different way to design and achieve the same objective?</div><div>=
<br></div><div>--</div><div>Ravi</div>


<div>=C2=A0 =C2=A0</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e0141a79c3dc56e04e611a7ea--