incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Hutchison <>
Subject How do you construct an index and use it, especially in Ruby
Date Sun, 25 Apr 2010 18:14:39 GMT


I'm new to Cassandra and trying to work out how to do something that I've implemented any
number of times (e.g. TokyoCabinet, Perst, even the filesystem using grep :-) I've managed
to get some of this working in Cassandra but not all.

So here's the core of the situation.

I have this opaque chunk of data that I want to store in Cassandra and then find it again.

I can generate a key when the data is created very easily, and I've stored it in a straight
forward manner: in a column with a key whose value is the data. And I can retrieve it when
I know the key. No difficulties here at all, works fine.

Now I want to index this data taking what I imagine to be a pretty typical approach.

Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colour value will have
more than one chunk of data, same for size.

What I thought I'd do is make a super column and index the chunk of data kind of like: { 'colour'
=> { 'blue' => 1 }, 'size' => { 'large' => 1}} with the key equal to the key of
the chunk of data. And Cassandra stores it without error like that. So using the Ruby gem,
it'd be something along the lines of:

  cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' => { 'blue' => 1 },
'size' => { 'large' => 1 } })

Q1: is this a reasonable approach? It *seems* to be what I've read is supposed to be done.
The 1 is meaningless. Anyway, it executes without error in Ruby.

Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue' chunks of data?
I'm assuming get_range is the correct method, but what are the parameters? The docs say: get_range(column_family,
options={}) but that seems to be missing a bit of detail, in particular the super column name.

Q2a: So I know there's a :start and :finish key supported in the options hash, inclusive,
exclusive respectively. How do you define a range for equals with a UTF8 key? Surely not 'blue'.succ??
or by some kind of suffix??

Q2b: How do you specify the super column name 'colour'? Looking at the (Ruby) source of the
get_range method and I'm unconvinced that this is implemented (seems to be a constant '' used
where the super column name makes sense to be.)

Anyway I ended up hacking at the Ruby gem's source to use the column name where the '' was
in the original, and didn't really get anywhere useful (I can find nothing, or everything,
nothing in between).

Q3: If I am correct about what is supposed to be done, does the Ruby gem support it?

Q4: Does anyone know of some Ruby code that does and indexed lookup that they could point
me at. (lots of code that indexes but nothing that searches by the index)

I'll try to take a look at some of the other Cassandra client implementations and see if I
can get this model to work. Maybe just a Ruby problem?? With any luck, it'll be me messing

If it'd help I can post the source of what I have, but it'll need some cleanup. Let me know.

Thanks for taking the time to read this far :-)


Bob Hutchison
Recursive Design Inc.

Bob Hutchison
Recursive Design Inc.

View raw message