Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 75114 invoked from network); 26 Apr 2010 16:56:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Apr 2010 16:56:57 -0000 Received: (qmail 42568 invoked by uid 500); 26 Apr 2010 16:56:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 42544 invoked by uid 500); 26 Apr 2010 16:56:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 42536 invoked by uid 99); 26 Apr 2010 16:56:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 16:56:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ryan@twitter.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 16:56:50 +0000 Received: by vws13 with SMTP id 13so1329106vws.31 for ; Mon, 26 Apr 2010 09:56:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.227.83 with SMTP id iz19mr5334715qcb.44.1272300988963; Mon, 26 Apr 2010 09:56:28 -0700 (PDT) Received: by 10.229.211.78 with HTTP; Mon, 26 Apr 2010 09:56:28 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Apr 2010 09:56:28 -0700 Message-ID: Subject: Re: How do you construct an index and use it, especially in Ruby From: Ryan King To: user@cassandra.apache.org Cc: Bob Hutchison Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison wrote: > > Hi, > > I'm new to Cassandra and trying to work out how to do something that I've= implemented any number of times (e.g. TokyoCabinet, Perst, even the filesy= stem using grep :-) I've managed to get some of this working in Cassandra b= ut not all. > > So here's the core of the situation. > > I have this opaque chunk of data that I want to store in Cassandra and th= en find it again. > > I can generate a key when the data is created very easily, and I've store= d it in a straight forward manner: in a column with a key whose value is th= e data. And I can retrieve it when I know the key. No difficulties here at = all, works fine. > > Now I want to index this data taking what I imagine to be a pretty typica= l approach. > > Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colo= ur value will have more than one chunk of data, same for size. > > What I thought I'd do is make a super column and index the chunk of data = kind of like: { 'colour' =3D> { 'blue' =3D> 1 }, 'size' =3D> { 'large' =3D>= 1}} with the key equal to the key of the chunk of data. And Cassandra stor= es it without error like that. So using the Ruby gem, it'd be something alo= ng the lines of: > > =A0cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' =3D> {= 'blue' =3D> 1 }, 'size' =3D> { 'large' =3D> 1 } }) > > Q1: is this a reasonable approach? It *seems* to be what I've read is sup= posed to be done. The 1 is meaningless. Anyway, it executes without error i= n Ruby. No. In order to index your data, you need to invert it. Since you're working in ruby I'd recommend CassandraObject: http://github.com/nzKoz/cassandra_object. It has indexing built in. -ryan > Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue'= chunks of data? I'm assuming get_range is the correct method, but what are= the parameters? The docs say: get_range(column_family, options=3D{}) but t= hat seems to be missing a bit of detail, in particular the super column nam= e. > > Q2a: So I know there's a :start and :finish key supported in the options = hash, inclusive, exclusive respectively. How do you define a range for equa= ls with a UTF8 key? Surely not 'blue'.succ?? or by some kind of suffix?? > > Q2b: How do you specify the super column name 'colour'? Looking at the (R= uby) source of the get_range method and I'm unconvinced that this is implem= ented (seems to be a constant '' used where the super column name makes sen= se to be.) > > Anyway I ended up hacking at the Ruby gem's source to use the column name= where the '' was in the original, and didn't really get anywhere useful (I= can find nothing, or everything, nothing in between). > > Q3: If I am correct about what is supposed to be done, does the Ruby gem = support it? > > Q4: Does anyone know of some Ruby code that does and indexed lookup that = they could point me at. (lots of code that indexes but nothing that searche= s by the index) > > I'll try to take a look at some of the other Cassandra client implementat= ions and see if I can get this model to work. Maybe just a Ruby problem?? W= ith any luck, it'll be me messing up. > > If it'd help I can post the source of what I have, but it'll need some cl= eanup. Let me know. > > Thanks for taking the time to read this far :-) > > Bob > > ---- > Bob Hutchison > Recursive Design Inc. > http://www.recursive.ca/ > weblog: http://xampl.com/so > > > ---- > Bob Hutchison > Recursive Design Inc. > http://www.recursive.ca/ > weblog: http://xampl.com/so > > > > >