Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2C2648C1 for ; Sun, 15 May 2011 07:57:42 +0000 (UTC) Received: (qmail 53176 invoked by uid 500); 15 May 2011 07:57:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53087 invoked by uid 500); 15 May 2011 07:57:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53035 invoked by uid 99); 15 May 2011 07:57:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 May 2011 07:57:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of charles.blaxland@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 May 2011 07:57:31 +0000 Received: by bwz13 with SMTP id 13so3631571bwz.31 for ; Sun, 15 May 2011 00:57:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type; bh=GjwD65/ya7tUVwG/KJOlcqO49Osf7YojvZ6IBTfFhW0=; b=l5LLfm9f5ESLsdzd3G1XfuQD1ePb+RcpnMXYDaMQ2zeGBmFjY3jiFZhW1aq0ZtMcWQ JOLqoD9wYrGhOdL21Ii7oBGeMqRH/5PatQmgxkbmaKY+bvfEswkvQWFYg80oLARTe3+N q90ph2LP5Cm8aNvxNtNd75PN/JbVG5G8VCzg0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=UrXUvuF8cqURPpvTvI7UQbCxT6SnuuQ++XnRxIJnFDxDFYCQpBlW6Se2yLBr7+yy4F LDECs+J/uO9OxjVQHFyWASFfuXEd+mHXKibCzSraPi1mhQyTGfrdcRLMse+suIKIfPlW WoxMwqyvIYf0PjGovUTYZPSbUxN9D+NwRbMmw= Received: by 10.204.73.157 with SMTP id q29mr2938923bkj.101.1305446230205; Sun, 15 May 2011 00:57:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.59.84 with HTTP; Sun, 15 May 2011 00:56:49 -0700 (PDT) From: Charles Blaxland Date: Sun, 15 May 2011 17:56:49 +1000 Message-ID: Subject: Multiget_slice or composite column keys? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d64527d3245004a34be28e --0016e6d64527d3245004a34be28e Content-Type: text/plain; charset=ISO-8859-1 Hi All, New to Cassandra, so apologies if I don't fully grok stuff just yet. I have data keyed by a key as well as a date. I want to run a query to get multiple keys across multiple contiguous date ranges simultaneously. I'm currently storing the date along with the row key like this: key1|2011-05-15 { c1 : , c2 :, c3 : ... } key1|2011-05-16 { c1 : , c2 :, c3 : ... } key2|2011-05-15 { c1 : , c2 :, c3 : ... } key2|2011-05-16 { c1 : , c2 :, c3 : ... } ... I generate all the key/date combinations that I'm interested in and use multiget_slice to retrieve them, pulling in all the columns for each key (I need all the data, but the number of columns is small: less than 100). The total number of row keys retrieved will only be 100 or so. Now it strikes me I could also store this using composite columns, like this: key1 { 2011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :, 2011-05-16|c2 : , 2011-05-15|c3 : , 2011-05-16|c3 : , ... } key2 { 2011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :, 2011-05-16|c2 : , 2011-05-15|c3 : , 2011-05-16|c3 : , ... } ... Then use multislice_get again (but with less keys), and use a slice range to only retrieve the dates I'm interested in. Another alternative I guess would be to use OPP with the first storage approach and get_range_slices, but as I understand this would not be great for performance due to keys being clustered together on a single node? So my question is, which approach is best? One downside to the latter I guess is that the number of columns grows without bound (although with 2 billion to play with this isn't gonna be a problem any time soon). Also multiget_slice supports only one slice predicate, so I'd guess I'd have to use multiple queries to get multiple date ranges. Anyway, any thoughts/tips appreciated. Thanks, Charles --0016e6d64527d3245004a34be28e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi All,

New to Cassandra, so apologies if I don't fu= lly grok stuff just yet.

I have data keyed by a ke= y as well as a date. I want to run a query to get multiple keys across mult= iple contiguous date ranges simultaneously. I'm currently storing the d= ate along with the row key like this:

key1|2011-05-15 {=A0 c1 : ,=A0c2 :,=A0=A0c3 : ...= =A0}
key1|2011-05-16 {=A0 c1 : ,=A0c2 :,=A0=A0c3 : ...= =A0}
key2|2011-05-15 {=A0 c1 : ,=A0c2 :,=A0=A0c3 :=A0...=A0= }
key2|2011-05-16 {=A0 c1 : ,=A0c2 :,=A0=A0c3 :=A0...=A0}
...

I generate all the key/date combinations = that I'm interested in and use multiget_slice to retrieve them, pulling= in all the columns for each key (I need all the data, but the number of co= lumns is small: less than 100). The total number of row keys retrieved will= only be 100 or so.

Now it strikes me I could also store this using composi= te columns, like this:

key1 { =A020= 11-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :,=A02011-05-16|c2 : ,=A0201= 1-05-15|c3=A0: ,=A02011-05-16|c3 : ,=A0...=A0}
key2 { =A02011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-= 15|c2 :, 2011-05-16|c2 : ,=A02011-05-15|c3=A0: , 2011-05-16|c3 : ,=A0...=A0= }
...

Then use mu= ltislice_get again (but with less keys), and use a slice range to only retr= ieve the dates I'm interested in.

Another alternative I guess would be to use OPP with th= e first storage approach and get_range_slices, but as I understand this wou= ld not be great for performance due to keys being clustered together on a s= ingle node?

So my question is, which approach is best? One downside= to the latter I guess is that the number of columns grows without bound (a= lthough with 2 billion to play with this isn't gonna be =A0a problem an= y time soon). Also multiget_slice supports only one slice predicate, so I&#= 39;d guess I'd have to use multiple queries to get multiple date ranges= .

Anyway, any thoughts/tips appreciated.

Thanks,
Charles

--0016e6d64527d3245004a34be28e--