Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=JdYQA7s999
	xX1ME0NNi+pKb+hoRg+5M+piP0Zc94TUt+hy71SyL2RAEthD+M/m0Awe0fKMye7t
	nv8VEPKnTcCLjEo/yC2cvCVKmNuFPnphEgN/JWGqzvYdu98zY/X+0sHLldRYtgUK
	ar4abjLIGXTZmleeT1M0Qmti/B1xUvo+A=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2"
Subject: Re: Composite keys and range queries
Date: Thu, 15 Mar 2012 21:57:47 +1300
In-Reply-To: 
 <CAGEfnJOz8ULAQfrdoYPnrCQOoZ86FY0OZfiwp2KXwihO2dwZkA@mail.gmail.com>
To: user@cassandra.apache.org
References: 
 <CAGEfnJNLf+Mb_q13NS-8E838wfUerA7iwbswL1am0JGEvk9vBg@mail.gmail.com>
 <CAGEfnJMoz1RA-WfdRYj-qM3jXzwXhSoZuVvM=bY0B1GiyG4Rxw@mail.gmail.com>
 <F8323F30-B836-4C70-99CA-0F951535CD20@thelastpickle.com>
 <CAGEfnJNAxma4zcMa-o9W7BOSghLFfKXZx+vNc61BysTppxeBJA@mail.gmail.com>
 <63CCA5D3F3175843B5C153AD218C2FBF014B63@MSEXCHM83.morningstar.com>
 <CAGEfnJOz8ULAQfrdoYPnrCQOoZ86FY0OZfiwp2KXwihO2dwZkA@mail.gmail.com>
Message-Id: <D5AAB2EA-9F5B-48BF-B9E3-F0F8FF3C47FC@thelastpickle.com>


--Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

>  is there any disadvantage to using supercolumns here?=20
There are some http://wiki.apache.org/cassandra/CassandraLimitations

I would avoid them if you can. The one thing you cannot do when using =
CompositeTypes for column names is  a range delete. If you delete a =
super column, then you delete all the sub columns. However if you have a =
two part column name you cannot delete everything that matches "foo:*"

> They seem a little cleaner and more straightforward for my use case, =
since I don't have the advantage of the CQL composite key thing.
If they scratch your it's grab the 1.1 beta and give them a try and let =
us know how they work for you.=20
http://cassandra.apache.org/download/

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/03/2012, at 10:23 AM, John Laban wrote:

> Ahhh, ok, I thought that CQL was just being brought up to date with =
the functionality already built into composite keys, but I guess I was =
mistaken there. =20
>=20
> But I guess it's just providing a convenient abstraction, using =
composite column names under the hood.  That's where I was confused, =
thanks.
>=20
> So, in terms of composite column names vs supercolumns:  is the only =
advantage to composite column names that you can do column slicing on =
subsets of the "subcolumns"? I.e. if I don't mind loading all of the =
subcolumns for a given supercolumn name in memory at once (since I need =
them all anyway), is there any disadvantage to using supercolumns here?  =
They seem a little cleaner and more straightforward for my use case, =
since I don't have the advantage of the CQL composite key thing.
>=20
> Thanks,
> John
>=20
>=20
> On Wed, Mar 14, 2012 at 12:53 PM, Jeremiah Jordan =
<JEREMIAH.JORDAN@morningstar.com> wrote:
> Right, so until the new CQL stuff exists to actually query with =
something smart enough to know about "composite keys" , You have to =
define and query on your own.
>=20
> Row Key =3D UUID
> Column =3D CompositeColumn(string, string)
>=20
> You want to then use COLUMN slicing, not row ranges to query the data. =
 Where you slice in priority as the first part of a Composite Column =
Name.
>=20
> See the "Under the hood and historical notes" section of the blog =
post.  You want to layout your data per the "Physical representation of =
the denormalized timeline rows" diagram.
> Where your UUID is the "user_id" from the example, and your priority =
is the "tweet_id"
>=20
> -Jeremiah
>=20
>=20
> From: John Laban [john@pagerduty.com]
> Sent: Wednesday, March 14, 2012 12:37 PM
> To: user@cassandra.apache.org
> Subject: Re: Composite keys and range queries
>=20
> Hmm, now I'm really confused.
>=20
> > This may be of use to you =
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
>=20
> This article is what I actually used to come up with my schema here.  =
In the "Clustering, composite keys, and more" section they're using a =
schema very similarly to how I'm trying to use it.  They define a =
composite key with two parts, expecting the first part to be used as the =
partition key and the second part to be used for ordering.
>=20
> > The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) =
may be 1 .
>=20
> Why?  Shouldn't only "uuid-1" be used as the partition key?  (So =
shouldn't those two hash to the same location?)
>=20
> I'm thinking of using supercolumns for this instead as I know they'll =
work (where the row key is the uuid and the supercolumn name is the =
priority), but aren't composite row keys supposed to essentially replace =
the need for supercolumns?
>=20
> Thanks, and sorry if I'm getting this all wrong,
> John
>=20
>=20
>=20
> On Wed, Mar 14, 2012 at 12:52 AM, aaron morton =
<aaron@thelastpickle.com> wrote:
> You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp
>=20
> The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) =
may be 1 .
>=20
> You cannot do what you want to. Even if you passed a start of =
(uuid1,<empty>) and no finish, you would not only get rows where the key =
starts with uuid1.
>=20
> This may be of use to you =
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
>=20
> Or you can store all the priorities that are valid for an ID in =
another row.
>=20
> Cheers
>=20
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>=20
> On 14/03/2012, at 1:05 PM, John Laban wrote:
>=20
> > Forwarding to the Cassandra mailing list as well, in case this is =
more of an issue on how I'm using Cassandra.
> >
> > Am I correct to assume that I can use range queries on composite row =
keys, even when using a RandomPartitioner, if I make sure that the first =
part of the composite key is fixed?
> >
> > Any help would be appreciated,
> > John
> >
> >
> >
> > On Tue, Mar 13, 2012 at 12:15 PM, John Laban <john@pagerduty.com> =
wrote:
> > Hi,
> >
> > I have a column family that uses a composite key:
> >
> > (ID, priority) -> ...
> >
> > Where the ID is a UUID and the priority is an integer.
> >
> > I'm trying to perform a range query now:  I want all the rows where =
the ID matches some fixed UUID, but within a range of priorities.  This =
is supported even if I'm using a RandomPartitioner, right?  (Because the =
first key in the composite key is the partition key, and the second part =
of the composite key is automatically ordered?)
> >
> > So I perform a range slices query:
> >
> > val rangeQuery =3D HFactory.createRangeSlicesQuery(keyspace, new =
CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
> > rangeQuery.setColumnFamily(RouteColumnFamilyName).
> >             setKeys( new Composite(id, priorityStart), new =
Composite(id, priorityEnd) ).
> >             setRange( null, null, false, Int.MaxValue )
> >
> >
> > But I get this error:
> >
> > me.prettyprint.hector.api.exceptions.HInvalidRequestException: =
InvalidRequestException(why:start key's md5 sorts after end key's md5.  =
this is not allowed; you probably should not specify end key at all, =
under RandomPartitioner)
> >
> > Shouldn't they have the same md5, since they have the same partition =
key?
> >
> > Am I using the wrong query here, or does Hector not support composte =
range queries, or am I making some mistake in how I think Cassandra's =
composite keys work?
> >
> > Thanks,
> > John
> >
> >
>=20
>=20
>=20


--Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite"><div><div><div>&nbsp;is there any =
disadvantage to using supercolumns =
here?&nbsp;</div></div></div></blockquote>There are some&nbsp;<a =
href=3D"http://wiki.apache.org/cassandra/CassandraLimitations">http://wiki=
.apache.org/cassandra/CassandraLimitations</a><div><br></div><div>I =
would avoid them if you can. The one thing you cannot do when using =
CompositeTypes for column names is &nbsp;a range delete. If you delete a =
super column, then you delete all the sub columns. However if you have a =
two part column name you cannot delete everything that matches =
"foo:*"</div><div><br></div><div><blockquote =
type=3D"cite"><div><div><div>They seem a little cleaner and more =
straightforward for my use case, since I don't have the advantage of the =
CQL composite key thing.</div></div></div></blockquote>If they scratch =
your it's grab the 1.1 beta and give them a try and let us know how they =
work for you.&nbsp;</div><div><a =
href=3D"http://cassandra.apache.org/download/">http://cassandra.apache.org=
/download/</a></div><div><br></div><div>Cheers</div><div><br><div =
apple-content-edited=3D"true">
</div>
<br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>
<br><div><div>On 15/03/2012, at 10:23 AM, John Laban wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">Ahhh, ok, =
I thought that CQL was just being brought up to date with =
the&nbsp;functionality&nbsp;already built into composite keys, but I =
guess I was mistaken there. &nbsp;<div><br></div><div>But I guess it's =
just providing a convenient abstraction, using composite column names =
under the hood. &nbsp;That's where I was confused, thanks.<div>

<div><br></div><div>So, in terms of composite column names vs =
supercolumns: &nbsp;is the only advantage to composite column names that =
you can do column slicing on subsets of the "subcolumns"? I.e. if I =
don't mind loading all of the subcolumns for a given supercolumn name in =
memory at once (since I need them all anyway), is there any disadvantage =
to using supercolumns here? &nbsp;They seem a little cleaner and more =
straightforward for my use case, since I don't have the advantage of the =
CQL composite key thing.</div>

<div><br></div><div>Thanks,</div><div>John</div><div><br><br><div =
class=3D"gmail_quote">On Wed, Mar 14, 2012 at 12:53 PM, Jeremiah Jordan =
<span dir=3D"ltr">&lt;<a =
href=3D"mailto:JEREMIAH.JORDAN@morningstar.com">JEREMIAH.JORDAN@morningsta=
r.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>
<div style=3D"direction:ltr;font-size:10pt;font-family:Helvetica">
Right, so until the new CQL stuff exists to actually query with =
something smart enough to know about "composite keys" , You have to =
define and query on your own.<br>
<br>
Row Key =3D UUID<br>
Column =3D CompositeColumn(string, string)<br>
<br>
You want to then use COLUMN slicing, not row ranges to query the =
data.&nbsp; Where you slice in priority as the first part of a Composite =
Column Name.<br>
<br>
See the "Under the hood and historical notes" section of the blog =
post.&nbsp; You want to layout your data per the "Physical =
representation of the denormalized timeline rows" diagram.<br>
Where your UUID is the "user_id" from the example, and your priority is =
the "tweet_id"<br>
<br>
-Jeremiah<br>
<br>
<br>
<div style=3D"font-size:16px;font-family:Times New Roman">
<hr>
<div style=3D"direction:ltr"><font color=3D"#000000" =
face=3D"Tahoma"><b>From:</b> John Laban [<a =
href=3D"mailto:john@pagerduty.com" =
target=3D"_blank">john@pagerduty.com</a>]<br>
<b>Sent:</b> Wednesday, March 14, 2012 12:37 PM<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" =
target=3D"_blank">user@cassandra.apache.org</a><br>
<b>Subject:</b> Re: Composite keys and range queries<br>
</font><br>
</div><div><div class=3D"h5">
<div></div>
<div>
<div>Hmm, now I'm really confused.</div>
<div><br>
</div>
<div>&gt;&nbsp;This may be of use to you&nbsp;<a =
href=3D"http://www.datastax.com/dev/blog/schema-in-cassandra-1-1" =
target=3D"_blank">http://www.datastax.com/dev/blog/schema-in-cassandra-1-1=
</a></div>
<div><br>
</div>
<div>This article is what I actually used to come up with my schema =
here. &nbsp;In the "Clustering, composite keys, and more" section =
they're using a schema very similarly to how I'm trying to use it. =
&nbsp;They define a composite key with two parts, expecting the first
 part to be used as the partition key and the second part to be used for =
ordering.</div>
<div><br>
</div>
<div>&gt;&nbsp;The hash for (uuid-1 , p1) may be 100 and the hash for =
(uuid-1, p2) may be 1 .</div>
<div><br>
</div>
<div>Why? &nbsp;Shouldn't only "uuid-1" be used as the partition key? =
&nbsp;(So shouldn't those two hash to the same location?)</div>
<div><br>
</div>
<div>I'm thinking of using supercolumns for this instead as I know =
they'll work (where the row key is the uuid and the supercolumn name is =
the priority), but aren't composite row keys supposed to essentially =
replace the need for supercolumns?</div>


<div><br>
</div>
<div>Thanks, and sorry if I'm getting this all wrong,</div>
<div>John</div>
<div><br>
</div>
<br>
<br>
<div class=3D"gmail_quote">On Wed, Mar 14, 2012 at 12:52 AM, aaron =
morton <span dir=3D"ltr">
&lt;<a href=3D"mailto:aaron@thelastpickle.com" =
target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
You are seeing this <a =
href=3D"http://wiki.apache.org/cassandra/FAQ#range_rp" target=3D"_blank">
http://wiki.apache.org/cassandra/FAQ#range_rp</a><br>
<br>
The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may =
be 1 .<br>
<br>
You cannot do what you want to. Even if you passed a start of =
(uuid1,&lt;empty&gt;) and no finish, you would not only get rows where =
the key starts with uuid1.<br>
<br>
This may be of use to you <a =
href=3D"http://www.datastax.com/dev/blog/schema-in-cassandra-1-1" =
target=3D"_blank">
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1</a><br>
<br>
Or you can store all the priorities that are valid for an ID in another =
row.<br>
<br>
Cheers<br>
<br>
-----------------<br>
Aaron Morton<br>
Freelance Developer<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com/" =
target=3D"_blank">http://www.thelastpickle.com</a><br>
<div>
<div><br>
On 14/03/2012, at 1:05 PM, John Laban wrote:<br>
<br>
&gt; Forwarding to the Cassandra mailing list as well, in case this is =
more of an issue on how I'm using Cassandra.<br>
&gt;<br>
&gt; Am I correct to assume that I can use range queries on composite =
row keys, even when using a RandomPartitioner, if I make sure that the =
first part of the composite key is fixed?<br>
&gt;<br>
&gt; Any help would be appreciated,<br>
&gt; John<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Tue, Mar 13, 2012 at 12:15 PM, John Laban &lt;<a =
href=3D"mailto:john@pagerduty.com" =
target=3D"_blank">john@pagerduty.com</a>&gt; wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt; I have a column family that uses a composite key:<br>
&gt;<br>
&gt; (ID, priority) -&gt; ...<br>
&gt;<br>
&gt; Where the ID is a UUID and the priority is an integer.<br>
&gt;<br>
&gt; I'm trying to perform a range query now: &nbsp;I want all the rows =
where the ID matches some fixed UUID, but within a range of priorities. =
&nbsp;This is supported even if I'm using a RandomPartitioner, right? =
&nbsp;(Because the first key in the composite key is the partition
 key, and the second part of the composite key is automatically =
ordered?)<br>
&gt;<br>
&gt; So I perform a range slices query:<br>
&gt;<br>
&gt; val rangeQuery =3D HFactory.createRangeSlicesQuery(keyspace, new =
CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)<br>
&gt; rangeQuery.setColumnFamily(RouteColumnFamilyName).<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; setKeys( new =
Composite(id, priorityStart), new Composite(id, priorityEnd) ).<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; setRange( null, null, =
false, Int.MaxValue )<br>
&gt;<br>
&gt;<br>
&gt; But I get this error:<br>
&gt;<br>
&gt; me.prettyprint.hector.api.exceptions.HInvalidRequestException: =
InvalidRequestException(why:start key's md5 sorts after end key's md5. =
&nbsp;this is not allowed; you probably should not specify end key at =
all, under RandomPartitioner)<br>


&gt;<br>
&gt; Shouldn't they have the same md5, since they have the same =
partition key?<br>
&gt;<br>
&gt; Am I using the wrong query here, or does Hector not support =
composte range queries, or am I making some mistake in how I think =
Cassandra's composite keys work?<br>
&gt;<br>
&gt; Thanks,<br>
&gt; John<br>
&gt;<br>
&gt;<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div></div></div>
</div>
</div>

</blockquote></div><br></div></div></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2--