Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 86971 invoked from network); 23 Apr 2010 20:34:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Apr 2010 20:34:06 -0000 Received: (qmail 22051 invoked by uid 500); 23 Apr 2010 20:34:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 22030 invoked by uid 500); 23 Apr 2010 20:34:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22022 invoked by uid 99); 23 Apr 2010 20:34:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Apr 2010 20:34:05 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.222.174] (HELO mail-pz0-f174.google.com) (209.85.222.174) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Apr 2010 20:33:57 +0000 Received: by pzk4 with SMTP id 4so3161822pzk.9 for ; Fri, 23 Apr 2010 13:33:34 -0700 (PDT) Received: by 10.141.107.3 with SMTP id j3mr740006rvm.283.1272054814375; Fri, 23 Apr 2010 13:33:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.134.19 with HTTP; Fri, 23 Apr 2010 13:33:14 -0700 (PDT) From: Larry Root Date: Fri, 23 Apr 2010 13:33:14 -0700 Message-ID: Subject: Trying To Understand get_range_slices Results When Using RandomPartitioner To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd137be587fb70484ed57ab X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd137be587fb70484ed57ab Content-Type: text/plain; charset=ISO-8859-1 I trying to better understand how using the RandomPartitioner will affect my ability to select ranges of keys. Consider my simple example where we have many online games across different game genres (GameType). These games need to store data for each one of their users. With that in mind consider the following data model: enum GameType {'RPG', 'FPS', 'ARCADE'} { "GameData": { // Super Column Family *GameType+"1234"*: { // Row (concat gametype with a game id for example) *"user-data:5678"*:{ // Super column (user data) *"user_prop_name"*: "value",// Subcolumn (arbitrary user properties and values) * "another_prop_name"*: "value", ... }, *"user-data:9012"*:{ *"**user_prop_name**"*: "value", ... } }, * GameType+"3456"*: {...}, *GameType+"7890"*: {...}, ... } } Assume we have a multi node cluster running Cassandra 0.6.1. In that scenario could some one help me understand what the result would be in the following cases: 1. We use a range slice to grab keys for all 'RPG' games (range slice at the ROW level). Would we be able to get all games back in a single query or would that not be guaranteed? 2. For a given game we use a range slice to grab all user-data keys in which the ID starts with '5' (range slice at the COLUMN level). Again, would we be able to get all keys in one call (assuming number of keys in the result was not an issue)? 3. Finally for a given game and a given user we do a range slice to grab all user properties that start with 'a' (range slice at the SUBCOLUMN level of a SUPERCOLUMN). Is that possible in one call? I'm trying to understand at what level the RandomPartioner affects my example data model. Is it at a fixed level like just ROWS (the sub data is fixed to the same node) or is all data at every level *randomized* across all nodes. Are there any tricks to doing these sort of range slices using RP? For example if I set my consistency level to 'ALL' when doing a range slice would that effectively compile a complete result set for me? Thanks for the help! larry --000e0cd137be587fb70484ed57ab Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I trying to better understand how using the RandomPartitioner will affect m= y ability to select ranges of keys. Consider my simple example where we hav= e many online games across different game genres (GameType). These games ne= ed to store data for each one of their users. With that in mind consider th= e following data model:

en= um GameType {'RPG', 'FPS', 'ARCADE'}

{
=A0=A0=A0 "GameData": {=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 // Super Column Family
=A0=A0=A0 =A0=A0=A0= GameType+"1234": {=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 // Row (concat gametype with a game id for example)

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "user-data:5678&quo= t;:{=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 // Super column (user data)
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 "user_prop_name": "value",
// Subcolumn (arbitrary user properties and values)
=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "another_prop_name": "value",=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ...
=A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 },

=A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 "user-data:9012":{
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 "
user_prop_name"= ;: "value",
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0=A0 ...
=A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 },<= br>
=A0=A0=A0 =A0=A0=A0 GameType+"3456": {...},
=A0=A0=A0=A0=A0=A0=A0 GameType+= "7890": {...},
=A0=A0=A0=A0=A0=A0=A0 ...=A0=A0=A0 }
}

= Assume we have a multi node cluster running Cassandra 0.6.1. In that scenar= io could some one help me understand what the result would be in the follow= ing cases:
  1. We use a range slice to grab keys for all 'RPG' games (rang= e slice at the ROW level). Would we be able to get all games back in a sing= le query or would that not be guaranteed?

  2. For a given game = we use a range slice to grab all user-data keys in which the ID starts with= '5' (range slice at the COLUMN level). Again, would we be able to = get all keys in one call (assuming number of keys in the result was not an = issue)?

  3. Finally for a given game and a given user we do a range slice = to grab all user properties that start with 'a' (range slice at the= SUBCOLUMN level of a SUPERCOLUMN). Is that possible in one call?
I'm trying to understand at what level the RandomPartioner affects my e= xample data model. Is it at a fixed level like just ROWS (the sub data is f= ixed to the same node) or is all data at every level *randomized* across al= l nodes.

Are there any tricks to doing these sort of range slices using RP? For = example if I set my consistency level to 'ALL' when doing a range s= lice would that effectively compile a complete result set for me?

Thanks for the help!

larry --000e0cd137be587fb70484ed57ab--