Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 60527 invoked from network); 26 Apr 2010 03:23:36 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Apr 2010 03:23:36 -0000 Received: (qmail 97757 invoked by uid 500); 26 Apr 2010 03:23:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97704 invoked by uid 500); 26 Apr 2010 03:23:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97696 invoked by uid 99); 26 Apr 2010 03:23:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 03:23:35 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.187.133] (HELO webmail7.g.dreamhost.com) (208.97.187.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 03:23:27 +0000 Received: by webmail7.g.dreamhost.com (Postfix, from userid 999) id 481E944149; Sun, 25 Apr 2010 20:23:05 -0700 (PDT) To: user@cassandra.apache.org Subject: strange =?UTF-8?Q?get=5Frange=5Fslices=20behaviour=20v=30=2E=36=2E=31?= MIME-Version: 1.0 Date: Sun, 25 Apr 2010 20:23:05 -0700 From: aaron Message-ID: <568eeb2e1d814eb4c8733fea0249448f@localhost> X-Sender: aaron@the-mortons.org User-Agent: RoundCube Webmail/0.2-stable Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="UTF-8" X-Virus-Checked: Checked by ClamAV on apache.org I've been looking at the get_range_slices feature and have found some odd behaviour I do not understand. Basically the keys returned in a range query do not match what I would expect to see. I think it may have something to do with the ordering of keys that I don't know about, but I'm just guessing. On Cassandra v 0.6.1, single node local install; RandomPartitioner. Using Python and my own thin wrapper around the Thrift Python API. Step 1. Insert 3 keys into the "Standard 1" column family, called "object 1" "object 2" and "object 3", each with a single column called 'name' with a value like 'object1' Step 2. Do a get_range_slices call in the "Standard 1" CF, for column names ["name"] with start_key "object1" and end_key "object3". I expect to see three results, but I only see results for object1 and object2. Below are the thrift types I'm passing into the Cassandra.Client object... - ColumnParent(column_family='Standard1', super_column=None) - SlicePredicate(column_names=['name'], slice_range=None) - KeyRange(end_key='object3', start_key='object1', count=4000, end_token=None, start_token=None) and the output [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, name='name', value='object1'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, name='name', value='object3'), super_column=None)], key='object3')] Step 3. Modify the get_range_slices call, so the start_key is object2. In this case I expect to see 2 rows returned, but I get 3. Thrift args and return are below... - ColumnParent(column_family='Standard1', super_column=None) - SlicePredicate(column_names=['name'], slice_range=None) - KeyRange(end_key='object3', start_key='object2', count=4000, end_token=None, start_token=None) and the output [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715, name='name', value='object2'), super_column=None)], key='object2'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, name='name', value='object1'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, name='name', value='object3'), super_column=None)], key='object3')] Can anyone explain these odd results? As I said I've got my own python wrapper around the client, so I may be doing something wrong. But I've pulled out the thrift objects and they go in and out of the thrift Cassandra.Client, so I think I'm ok. (I have not noticed a systematic problem with my wrapper). On a more general note, is there information on the sort order of keys when using key ranges? I'm guessing the hash of the keys is compared and I wondering if the hash's of the keys maintain the order of the original values? Also I assume the order is byte order, rather than ascii or utf8. I was experimenting with the difference between column slicing and key slicing. In my I could write the keys in as column names (they are in buckets) as well and slice there first, then use the results to to make a multi key get. I'm trying to support features like, get me all the data where the key starts with "foo.bar". Thanks for the fun project. Aaron