Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6817A9BB5 for ; Thu, 15 Mar 2012 08:58:31 +0000 (UTC) Received: (qmail 1204 invoked by uid 500); 15 Mar 2012 08:58:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 1180 invoked by uid 500); 15 Mar 2012 08:58:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 1165 invoked by uid 99); 15 Mar 2012 08:58:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Mar 2012 08:58:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a55.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Mar 2012 08:58:19 +0000 Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 872D812C0CE for ; Thu, 15 Mar 2012 01:57:50 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=JdYQA7s999 xX1ME0NNi+pKb+hoRg+5M+piP0Zc94TUt+hy71SyL2RAEthD+M/m0Awe0fKMye7t nv8VEPKnTcCLjEo/yC2cvCVKmNuFPnphEgN/JWGqzvYdu98zY/X+0sHLldRYtgUK ar4abjLIGXTZmleeT1M0Qmti/B1xUvo+A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=ME1XExxrPkAiHxHS kWrP0Gb7iss=; b=Z7i6ow/hYW9JrMwDe0WEzpA0qP0NZHmdN6UyB79fmlqVoDMR 0q2vgeGgEHHDNDAWKKm/w3klkaUoTF/9sC9UCStxejsx3cZrea2EES4xFxhA++qQ 4E8/VJSMNivpQ7HaMYugg8AgqKwEaNTsNLUS1Ejiy1Fbu2TPwi9JpZYdB4U= Received: from [172.16.1.3] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id C41ED12C0C5 for ; Thu, 15 Mar 2012 01:57:49 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2" Subject: Re: Composite keys and range queries Date: Thu, 15 Mar 2012 21:57:47 +1300 In-Reply-To: To: user@cassandra.apache.org References: <63CCA5D3F3175843B5C153AD218C2FBF014B63@MSEXCHM83.morningstar.com> Message-Id: X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > is there any disadvantage to using supercolumns here?=20 There are some http://wiki.apache.org/cassandra/CassandraLimitations I would avoid them if you can. The one thing you cannot do when using = CompositeTypes for column names is a range delete. If you delete a = super column, then you delete all the sub columns. However if you have a = two part column name you cannot delete everything that matches "foo:*" > They seem a little cleaner and more straightforward for my use case, = since I don't have the advantage of the CQL composite key thing. If they scratch your it's grab the 1.1 beta and give them a try and let = us know how they work for you.=20 http://cassandra.apache.org/download/ Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/03/2012, at 10:23 AM, John Laban wrote: > Ahhh, ok, I thought that CQL was just being brought up to date with = the functionality already built into composite keys, but I guess I was = mistaken there. =20 >=20 > But I guess it's just providing a convenient abstraction, using = composite column names under the hood. That's where I was confused, = thanks. >=20 > So, in terms of composite column names vs supercolumns: is the only = advantage to composite column names that you can do column slicing on = subsets of the "subcolumns"? I.e. if I don't mind loading all of the = subcolumns for a given supercolumn name in memory at once (since I need = them all anyway), is there any disadvantage to using supercolumns here? = They seem a little cleaner and more straightforward for my use case, = since I don't have the advantage of the CQL composite key thing. >=20 > Thanks, > John >=20 >=20 > On Wed, Mar 14, 2012 at 12:53 PM, Jeremiah Jordan = wrote: > Right, so until the new CQL stuff exists to actually query with = something smart enough to know about "composite keys" , You have to = define and query on your own. >=20 > Row Key =3D UUID > Column =3D CompositeColumn(string, string) >=20 > You want to then use COLUMN slicing, not row ranges to query the data. = Where you slice in priority as the first part of a Composite Column = Name. >=20 > See the "Under the hood and historical notes" section of the blog = post. You want to layout your data per the "Physical representation of = the denormalized timeline rows" diagram. > Where your UUID is the "user_id" from the example, and your priority = is the "tweet_id" >=20 > -Jeremiah >=20 >=20 > From: John Laban [john@pagerduty.com] > Sent: Wednesday, March 14, 2012 12:37 PM > To: user@cassandra.apache.org > Subject: Re: Composite keys and range queries >=20 > Hmm, now I'm really confused. >=20 > > This may be of use to you = http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 >=20 > This article is what I actually used to come up with my schema here. = In the "Clustering, composite keys, and more" section they're using a = schema very similarly to how I'm trying to use it. They define a = composite key with two parts, expecting the first part to be used as the = partition key and the second part to be used for ordering. >=20 > > The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) = may be 1 . >=20 > Why? Shouldn't only "uuid-1" be used as the partition key? (So = shouldn't those two hash to the same location?) >=20 > I'm thinking of using supercolumns for this instead as I know they'll = work (where the row key is the uuid and the supercolumn name is the = priority), but aren't composite row keys supposed to essentially replace = the need for supercolumns? >=20 > Thanks, and sorry if I'm getting this all wrong, > John >=20 >=20 >=20 > On Wed, Mar 14, 2012 at 12:52 AM, aaron morton = wrote: > You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp >=20 > The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) = may be 1 . >=20 > You cannot do what you want to. Even if you passed a start of = (uuid1,) and no finish, you would not only get rows where the key = starts with uuid1. >=20 > This may be of use to you = http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 >=20 > Or you can store all the priorities that are valid for an ID in = another row. >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 14/03/2012, at 1:05 PM, John Laban wrote: >=20 > > Forwarding to the Cassandra mailing list as well, in case this is = more of an issue on how I'm using Cassandra. > > > > Am I correct to assume that I can use range queries on composite row = keys, even when using a RandomPartitioner, if I make sure that the first = part of the composite key is fixed? > > > > Any help would be appreciated, > > John > > > > > > > > On Tue, Mar 13, 2012 at 12:15 PM, John Laban = wrote: > > Hi, > > > > I have a column family that uses a composite key: > > > > (ID, priority) -> ... > > > > Where the ID is a UUID and the priority is an integer. > > > > I'm trying to perform a range query now: I want all the rows where = the ID matches some fixed UUID, but within a range of priorities. This = is supported even if I'm using a RandomPartitioner, right? (Because the = first key in the composite key is the partition key, and the second part = of the composite key is automatically ordered?) > > > > So I perform a range slices query: > > > > val rangeQuery =3D HFactory.createRangeSlicesQuery(keyspace, new = CompositeSerializer, StringSerializer.get, BytesArraySerializer.get) > > rangeQuery.setColumnFamily(RouteColumnFamilyName). > > setKeys( new Composite(id, priorityStart), new = Composite(id, priorityEnd) ). > > setRange( null, null, false, Int.MaxValue ) > > > > > > But I get this error: > > > > me.prettyprint.hector.api.exceptions.HInvalidRequestException: = InvalidRequestException(why:start key's md5 sorts after end key's md5. = this is not allowed; you probably should not specify end key at all, = under RandomPartitioner) > > > > Shouldn't they have the same md5, since they have the same partition = key? > > > > Am I using the wrong query here, or does Hector not support composte = range queries, or am I making some mistake in how I think Cassandra's = composite keys work? > > > > Thanks, > > John > > > > >=20 >=20 >=20 --Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
 is there any = disadvantage to using supercolumns = here? 
There are some http://wiki= .apache.org/cassandra/CassandraLimitations

I = would avoid them if you can. The one thing you cannot do when using = CompositeTypes for column names is  a range delete. If you delete a = super column, then you delete all the sub columns. However if you have a = two part column name you cannot delete everything that matches = "foo:*"

They seem a little cleaner and more = straightforward for my use case, since I don't have the advantage of the = CQL composite key thing.
If they scratch = your it's grab the 1.1 beta and give them a try and let us know how they = work for you. 

Cheers


http://www.thelastpickle.com

On 15/03/2012, at 10:23 AM, John Laban wrote:

Ahhh, ok, = I thought that CQL was just being brought up to date with = the functionality already built into composite keys, but I = guess I was mistaken there.  

But I guess it's = just providing a convenient abstraction, using composite column names = under the hood.  That's where I was confused, thanks.

So, in terms of composite column names vs = supercolumns:  is the only advantage to composite column names that = you can do column slicing on subsets of the "subcolumns"? I.e. if I = don't mind loading all of the subcolumns for a given supercolumn name in = memory at once (since I need them all anyway), is there any disadvantage = to using supercolumns here?  They seem a little cleaner and more = straightforward for my use case, since I don't have the advantage of the = CQL composite key thing.

Thanks,
John


On Wed, Mar 14, 2012 at 12:53 PM, Jeremiah Jordan = <JEREMIAH.JORDAN@morningsta= r.com> wrote:
Right, so until the new CQL stuff exists to actually query with = something smart enough to know about "composite keys" , You have to = define and query on your own.

Row Key =3D UUID
Column =3D CompositeColumn(string, string)

You want to then use COLUMN slicing, not row ranges to query the = data.  Where you slice in priority as the first part of a Composite = Column Name.

See the "Under the hood and historical notes" section of the blog = post.  You want to layout your data per the "Physical = representation of the denormalized timeline rows" diagram.
Where your UUID is the "user_id" from the example, and your priority is = the "tweet_id"

-Jeremiah



From: John Laban [john@pagerduty.com]
Sent: Wednesday, March 14, 2012 12:37 PM
To: user@cassandra.apache.org
Subject: Re: Composite keys and range queries

Hmm, now I'm really confused.


This article is what I actually used to come up with my schema = here.  In the "Clustering, composite keys, and more" section = they're using a schema very similarly to how I'm trying to use it. =  They define a composite key with two parts, expecting the first part to be used as the partition key and the second part to be used for = ordering.

> The hash for (uuid-1 , p1) may be 100 and the hash for = (uuid-1, p2) may be 1 .

Why?  Shouldn't only "uuid-1" be used as the partition key? =  (So shouldn't those two hash to the same location?)

I'm thinking of using supercolumns for this instead as I know = they'll work (where the row key is the uuid and the supercolumn name is = the priority), but aren't composite row keys supposed to essentially = replace the need for supercolumns?

Thanks, and sorry if I'm getting this all wrong,
John



On Wed, Mar 14, 2012 at 12:52 AM, aaron = morton <aaron@thelastpickle.com> wrote:
You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may = be 1 .

You cannot do what you want to. Even if you passed a start of = (uuid1,<empty>) and no finish, you would not only get rows where = the key starts with uuid1.

This may be of use to you http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Or you can store all the priorities that are valid for an ID in another = row.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 1:05 PM, John Laban wrote:

> Forwarding to the Cassandra mailing list as well, in case this is = more of an issue on how I'm using Cassandra.
>
> Am I correct to assume that I can use range queries on composite = row keys, even when using a RandomPartitioner, if I make sure that the = first part of the composite key is fixed?
>
> Any help would be appreciated,
> John
>
>
>
> On Tue, Mar 13, 2012 at 12:15 PM, John Laban <john@pagerduty.com> wrote:
> Hi,
>
> I have a column family that uses a composite key:
>
> (ID, priority) -> ...
>
> Where the ID is a UUID and the priority is an integer.
>
> I'm trying to perform a range query now:  I want all the rows = where the ID matches some fixed UUID, but within a range of priorities. =  This is supported even if I'm using a RandomPartitioner, right? =  (Because the first key in the composite key is the partition key, and the second part of the composite key is automatically = ordered?)
>
> So I perform a range slices query:
>
> val rangeQuery =3D HFactory.createRangeSlicesQuery(keyspace, new = CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
> rangeQuery.setColumnFamily(RouteColumnFamilyName).
>             setKeys( new = Composite(id, priorityStart), new Composite(id, priorityEnd) ).
>             setRange( null, null, = false, Int.MaxValue )
>
>
> But I get this error:
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException: = InvalidRequestException(why:start key's md5 sorts after end key's md5. =  this is not allowed; you probably should not specify end key at = all, under RandomPartitioner)
>
> Shouldn't they have the same md5, since they have the same = partition key?
>
> Am I using the wrong query here, or does Hector not support = composte range queries, or am I making some mistake in how I think = Cassandra's composite keys work?
>
> Thanks,
> John
>
>




= --Apple-Mail=_96127402-5CAB-401B-98E0-4716CC40C3A2--